20Million it doesn't seem that much. Where are you extracting that data from? Are you sure that you do not have a network performance problem?
Besides that, do you really need to group by Year? I don't know your requirements, but personally I don't see the need of that. Sum is a quite fast operation in frontend. So if you already have your information group by Employee, it should be fast to get the year sum of that data.
Please clarify your requirement and explain better how are you extracting the data so people can help in a more focused way.
Data volume is 200 Million ..Sry It's not 20 million..
Data loading fast from QVD and resident load but it's taking much time for sum of measures ( 6 Fields) using group by
( 8 fields), takes over 30 minutes and use of 90% of the available memory (group by)
We have to add a calculation on the date, because orders picked till 3 o'clock next day have to record on current day.
This will take even more time, what will increase the time out of the limits of the timeframe.
Has anyone an idea where to change and where to add the calculation for best practice.
- build up the where exist() statements ?
-- speed up group by?
I expect there will be multiple "little" solutions so every idea is helpful
No need for full new script, just tips and tricks then i will figure it out
200 million records are quite a lot f data for an aggregation-load with 6 measures and over 8 fields. This won't be never fast and your load-times from 30 - 60 minutes isn't really bad. You might save time if you could remove some of the fields, keeping only one/two key-fields and adding the other fields afterwards per joining/mapping. Also a vertical and/or horizontal split of the task and adding them later again might be useful especially if your system runs out of memory and starts swapping with the virtual RAM. Another point which might reduce the load-times of this task is if the data are already sorted by loading them.
But before playing with a varity of rather small improvements I suggest to use an incremental approach for loading and transforming the data. Here you will find various posting to this topic and how data could be loaded in a optimized mode: Advanced topics for creating a qlik datamodel.
Try with this code, to see whether and how much this reduces the time needed to calculate the two tables. Check with 100mio or even 25mio rows:
EmployeeID & '-' & Year AS EmployeeYear,
EmployeeSales: // Leave aggr. data in separate table, no join
Sum(Sales) AS Sales
GROUP BY EmployeeYear;
Just like Peter, I would suggest to create an Id for those 8 fields that you need to aggregate. Create it with Autonumber, so it can perform faster, and then use it in the aggregation. Later you can Left Join it to you initial table, or use that ID field to connect both tables.
Either way you will get a lot of different ID, and like it was said before 200mio rows will always take time to aggregate. Be sure to aggregate to the minimum fields possible.