Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

scripting to get summary/aggregate statistics on Cartesian product of two fields in table

Hello! I've tried Google and the QV forums, but this is a fairly complicated one to explain and search for. I don't think I'm missing an obvious solution, but then again...

I have a single table with ~30 million records. To simplify greatly, assume the table looks like this:

DateSerialCodeValue
41590AA1
41589ZZ.5
41590AA2
41589ZZ1.1
41590AA3
41589ZZ.75

For each unique DateSerial+Code, I need summary/aggregation statistics like count() and sum(). I have "solved" this problem, but my solution is incredibly slow! The Cartesian product of DateSerial and Code is ~60,000; there are ~300 unique DateSerials and ~200 unique Codes.

After already scrubbing out nulls in DateSerial and Code, I loaded distinct values of each of those fields into their own new tables, then I did a join on the tables and concatenated the fields' values to get my ~60,000 unique DateSerial+Code values (example: "41590_AA"). I loop through that Cartesian table using a For Next loop hanging on RecNo(); for each Cartesian unique, I left join from my master table. I then left join onto that table the aggregation values of count(), numbercount(), and sum(). Finally, I take the topmost row from each Cartesian table and store that into a storage table.

The above craziness works, but it's incredibly slow. I end up with a table that looks like this:

Cartesian IndexCounted TotalNum Counted TotalSum Total
41590_AA20020075
41589_ZZ1951952.25
41500_XX2012018.675309

For reference, my script has been running for ~3 hours, and its on record ~250 of the ~60,000 Cartesian uniques. Help! I appreciate any guidance you can offer.

Unfortunately, due to the nature of the data, I can't really put the .QVW up here for reference.

Thanks!

1 Solution

Accepted Solutions
swuehl
MVP
MVP

A FOR NEXT loop probably shows really bad performance. I must admit I haven't fully understood the way you create your aggregates (store the top most row?), I think a copy of your script would really help me understanding this.

Have you already tried a GROUP BY LOAD of your table to create your aggregates? How does this perform?

You may still need to create the cross product table to get a table with all combinations, but then you just need to LEFT JOIN this table with your aggregation table and maybe go through the table once more to fill in zero values.

View solution in original post

2 Replies
swuehl
MVP
MVP

A FOR NEXT loop probably shows really bad performance. I must admit I haven't fully understood the way you create your aggregates (store the top most row?), I think a copy of your script would really help me understanding this.

Have you already tried a GROUP BY LOAD of your table to create your aggregates? How does this perform?

You may still need to create the cross product table to get a table with all combinations, but then you just need to LEFT JOIN this table with your aggregation table and maybe go through the table once more to fill in zero values.

Not applicable
Author

Wow... you are absolutely right. I didn't need the For..Next at all. I took your advice by left joining a the aggregate details using a group by clause. It reduced run time by 99.9 percent!

Thank you!