Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Why is SUM in load script so slow?

I have a load script where I am loading a table with ~1M rows.

The next thing I do is reload it (resident load) as a different table and do a SUM on one of the fields (group by two other fields).

Table2:

LOAD

   Field1,

   Field2,

   Sum(Metric) AS SumOfMetric

Resident Table1

Group By

   Field1,

   Field2;

It takes a very long time to run (>20 minutes), which is surprising since the original 1Mrow table loaded in a few seconds.

I have a server with 16 cores, but only one core is being used at any given time during the SUM.

Any idea why this is so slow or how I can speed it up?

The same operation in a chart in the UI is super fast!

5 Replies
Not applicable
Author

Did you get any response for solving your problem?

I got the same issue and don't know how to solve.

Best Regards

Ramon

Not applicable
Author

Hey Ramon,

No, I have not gotten any response. I am just trying to do all of my data transformation & aggregation in my SQL scripts, prior to loading it into QVDs.

Does anyone know why QlikView is so much slower than a database engine when doing simple aggregations?

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

This is surprising. Can you post the logfile where it runs slow?

-Rob

Not applicable
Author

Hey Rob,

Thanks for taking a look. Please see attached log file.

The delay happens right at the end, around 3:49 in the log. It take ~8 minutes to calculate a SUM on 1.2 million rows, grouping by 2 other fields.

Any help is appreciated!

Thanks,

     -Simon-

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Simon,

I think the answer is that you are processing many more than 1.2M  rows in the sum.

Table OOO starts with 1.2M rows. Then you join the Rooms table onto OOO. Assuming 100 rooms per property, table OOO now has 120M rows.

The sum and group by is therefore taking in 120M rows. The 1.2M reported in the log is the grouped by count, not the input row count.

So the timing may not be all that unreasonable. Hard to say without knowing the actual count of OOO.

I'm not sure why a sum() on a resident table is using only a single core. That may be an additional question once you confirm if my explanation makes sense.

-Rob

http://robwunderlich.com