Raw Data Sizing

 Performance Management  Comments Off on Raw Data Sizing
Jan 012014

I have been asked multiple times how much disk space will “a” performance management system take on given business requirements.

Well, this depends first on the implementation of the data schema. One system could persist just the name-value pairs in the raw data file, other system introduce extra columns such as last poll time, unit etc. and ask for more space.

Most PM systems maintain the raw records for a period of time in order to summarize data. If for example the first summarization is on the hourly level, 1 hour raw data must be maintained. After the summarization, the raw data can be purged. An the higher level summarizations will utilize the summarization 1 step below.

Because of the reasons such as late data arrival or manual data insertion, PM implementations maintain the raw data at least one day. This is because, it will need raw data for the recalculation of the summarizations. Regulatory reasons may also force the retention period of the raw data.

But how much disk space will be occupied if we retain 1 day of raw data? The math is simple and I will try to explain it below. However you have to take into action some side factors.  For example the solution can utilize file compression which can reduce the size required down to 20%.

A very rough sizing excluding the compression factor is below:

The requirements of the customer is: 100 devices on the network. Each device has at least 3 interfaces. 1 month of raw data retention.

So lets begin;

1 KPI must occupy at least 2 columns in the database/flat file excluding metadata:
KPI Id Column: Integer: 4 Byte
KPI Value Column: Double: 8 Byte
Total: 12 bytes per KPI

Device Based KPI Count: 10 (CPU Utilization, Memory Utilization, Uptime etc.)
Interface Count:3
Interface Based KPI Count: 10 (Throughput, Utilization, Speed, Packet Loss, Delay, Queue etc.)
Device Count: 100

10 + 3*10=40 KPIs per device.

40 KPI takes (12 bytes each)=480 bytes per device poll.
For 100 device; 48000 bytes for the whole network poll= 48KByte.

48K * 60 * 24 = 69120 KBytes=~ 70 MByte per day =~ 2,1 Gbyte per month(raw data)

Please note this would be the minimum. PM may need much more space in order to maintain the retention mechanism. You should be in close contact with your vendor for the correct sizing. But I advise you to do a quick math by yourself and compare the results with the vendor’s before sending the purchase order for the disk arrays.