Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
somebodysomeone
Contributor II
Contributor II

AGGR OPTIMIZATION

Hello there,

 

I have 40 millions records and 5 millions unique customer ids. Aggr below is working true but slow. What can do to make it faster? I want to aggr customers into 3 parts according to their volumes change between two dates: increase, decrease, zero chance.

 

 

=if(aggr(( sum({<FIRSTDATE={"$(=date(T1),'DD.MM.YYYY'))"}>} VOLUME) -
sum({<SECONDDATE={"$(=date(MAX(T2),'DD.MM.YYYY'))"}>} VOLUME) ),[CUSTOMERID])>0, 'INCREASE',
if(aggr(( sum({<FIRSTDATE={"$(=date(T1),'DD.MM.YYYY'))"}>} VOLUME) -
sum({<SECONDDATE={"$(=date(MAX(T2),'DD.MM.YYYY'))"}>} VOLUME) ),[CUSTOMERID])<0, 'DECREASE',
'ZERO'))

 

I cannot aggr it on cript or qvd files because there are many dates and between them there are many combinations. I have to calculate and aggr it in table.

 

Ty for ur help.

 

 

 

 

Labels (4)
3 Replies
marcus_sommer

The expression could be optimized by removing the second aggr, for example with something like this:

pick(match(

   sign(aggr( sum({<FIRSTDATE={"$(=date(T1),'DD.MM.YYYY'))"}>} VOLUME) -
                         sum({<SECONDDATE={"$(=date(MAX(T2),'DD.MM.YYYY'))"}>} VOLUME),[CUSTOMERID])),

            -1, 0, 1), 'DECREASE', 'ZERO', 'INCREASE')

If this is further too slow you will need to consider changes within your datamodel to reduce the number of tables to which the involved fields belong to and also their associations (means developing the datamodel in the direction of a star-scheme or a big flat-table).

- Marcus

 

somebodysomeone
Contributor II
Contributor II
Author

Unfortunately, it s working but slower than previous version 😕

I have a star scheme and it has just 4 dimensions.
marcus_sommer

Usually the above mentioned approach of removing redundant calculations of an expression improved the performance (an if-loop calculates always all branches completly and the true/false check of the branches is applied afterwards).

Nevertheless in your datamodel might be options to optimize theses expression. If the expression used the fields from all 4 dimensions and the fact-table the created virtual table on which the calculation will be performed might be quite heavy - and although the aggregations like the sum() here is multi-threaded the creation of the virtual table is a single-threaded process. More to this could you find here: Logical-Inference-and-Aggregations

- Marcus