Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
vicn1390
Contributor II
Contributor II

Why Sum() is so slow in a load script?

Hi!

I am currently creating aggregated data files (sales grouped by week, month).

What I found is that function Sum (and, actually, any other aggregation function) works very slow in a load script.

I made simple test script. It creates a table "id - value" with 10 million lines and then calculates total sums per "id".

t:

LOAD

Mod(RecNo(), 10) as id,

Mod(RecNo(), 100) as sales

AutoGenerate 10000000;

total:

LOAD

id,

sum(sales) as total_sales

Resident t

Group by id;

Here is a fragment from the log-file generated by the script execution.

2018-09-28 15:46:58 0038 LOAD

2018-09-28 15:46:58 0039 Mod(RecNo(), 10) as id,

2018-09-28 15:46:58 0040 Mod(RecNo(), 100) as sales

2018-09-28 15:46:58 0041 AutoGenerate 10000000

2018-09-28 15:47:00      2 fields found: id, sales,

2018-09-28 15:47:00      10 000 000 lines fetched

2018-09-28 15:47:00 0043 total:

2018-09-28 15:47:00 0044 LOAD

2018-09-28 15:47:00 0045 id,

2018-09-28 15:47:00 0046 sum(sales) as total_sales

2018-09-28 15:47:00 0047 Resident t

2018-09-28 15:47:00 0048 Group by id

2018-09-28 15:47:10      2 fields found: id, total_sales,

2018-09-28 15:47:10      10 lines fetched

As we can see the execution of "LOAD - sum() - group by" statement takes about 10 seconds. But when it goes in the application in a straight table chart this the same "sum()" expression calculates instantly.

When aggregating real table (10 million records and 20 fields)  with 7 sums and 1 item in "group by" clause script execution takes more than 2 minutes. But the calculation is still instant in a straight table!

Why my sums are so slow?

Please, tell me if it is something wrong with my Load. Is there any way to make my aggregations faster?

Thank you!

Victor.

5 Replies
Miguel_Angel_Baeyens

Any object in QlikView is using values and pointers primarily (also vectors and states, etc.) as a result of the in memory data model that has been loaded.

During the execution of the script, though, all those values and pointers do not exist yet, as they are still being created while the script runs, so my wild guess goes to that the load script is doing a lot of other stuff -although not visible- that the object has already done.

I don't think anybody here but perhaps Henric or anyone from Qlik R&D can tell this for sure, but also likely, some operations in the script are single threaded for a longer period of time, as the script has to check all those temporary pointers and distinct values or create ones if they don't exist, while most of the things that happen when you use the QlikView app in memory are most often multithreaded.

Last but not least, until recently, reading RESIDENT was slower than reading from QVDs, for example, which could also impact how long it takes to just do the RESIDENT load, even without aggregations.

In general, I avoid doing aggregations and JOINs in the script when millions of rows are involved in either table, because QlikView is not a database (not in the sense we all are accustomed to) and likewise the script is not an ETL, although it can extract, transform and load.

tomasz_tru
Specialist
Specialist

Qlik is not so good in data aggregation on script level (it can't benefit it's engine yet) so it is better to do GROUP BY on the data warehouse side.

marcus_sommer

In some way you already noticed that it is not mandatory to aggregate the data within the script to get a fast performance within the UI. As far as you could use rather simple calculations by avoiding aggr() and if-loops a qlik application will be quite performant even with large datasets. Also the savings on the RAM side might not be so significantly to apply always an aggregation where you don't need the granularity below it.

Beside this you could also use an incremental load-approach by aggregating the data. On the end of the following posting you will find a lot of examples about incremental loadings by using exists(): Advanced topics for creating a qlik datamodel.

- Marcus

vicn1390
Contributor II
Contributor II
Author

Marcus, thank you for providing me with the link!

These topics are very helpful for me and the other guys who are new with QlikView.

You have done a great job collecting all these useful links in one post.

Thank you!

Victor.

waszcma1
Partner - Creator II
Partner - Creator II

Well... I have noticed the same,


when I load data from DB it is much faster when I do aggregation on SQL lvl then it is done while I load the same data from QVD.

the difference on my data set exaple was 8 sec (SQL), 30 sec (QVD)