My client asked me to know the system and hardware requirements for a server dedicated to a specific tool within QlikView
The tool is basically reading huge amount of data coming from field tests and now that these are fully running they estimated they will produce around 500 GB of raw data each month.
In the demo and proof of concept we were running with a limited data set (around 3 GB) and the RAM used was around half (1.5 GB).
Ofc in the demo the script was not optimized yet and I believe we currently are taking a lot of data we don't need to load as well as better load them in memory as well as an external custom ETL software which will optimize and prepare the data for QlikView so probably a compression of 30% or less is achievable.
We will probably convince our customer that at some point the data will need to be dumped somewhere else and removed from QlikView (otherwise goodbye in memory analysis) but I believe they won't settle for less than one year of data (which would mean 6TB if the estimate was good).
My question here is if you have any experience with this sort of volumes
How are performances impacted by such an amount of data?
I obviously can't tell my customer he needs 2TB of RAM (roughly 30% of 6TB), do you think a sort of paging might be an option?
HP servers running Intel can go up to 2TB in Memory so you could tell your customer he needs that. But even so, how will the app perform? This will have to be delivered taking into consideration an aggregrated view of the data just to get the size down so QlikView can perform well for the end user then to drill into detail when appropriate.