Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
CavellMok
Contributor II
Contributor II

Aggr() Optimization on Large Dataset with User Input

Hi everyone,

I'm frustrated in this case because after 1 week hard work of building the dashboard, turns out the performance is unacceptable. I'm sure there are a smarter way to do it and I can't figure it out alone. Open to any suggestions as long as it works with huge dataset.

Background:

Main purpose of the dashboard is to allow user to quickly identify bulk shipment (received or shipped) to designated countries/for designated type. User can input quantity that they want to check(highlighted in yellow in above picture) for specific country/type.

2 datasets - Customer, Transaction

Sample Data.PNG

 

If user input 200 in USA quantity, it will only show the customer who received >= 200 on that day(group by date,cust_id). In this case, customer 1001 will be filter out because received from USA = 0 which is less than 200. If user input -10 in USA quantity, 1001 will be shown because total shipped is 149 which pass 10.

I already developed a solution but it doesn't work with large volume data.

My dataset volume: Customer-3M rows, Transaction-3M rows

My Solution(doesn't work with huge dataset)

vCountryUSA : USA quantity input by user (default is 0)

vCountryUSAFlag

aggr(nodistinct {1}
 if( $(vCountryUSA) > 0,//User input value greater than 0
     if(ttl_received_qty= 0 and ttl_shipped_qty <> 0,1,//If no inflow txn, not pass (set = 1) 
    if(in_out = 'Received' and country = 'USA',
                  if(sum(Fund) >= $(vCountryUSA), 0,1)//pass threshold = 0
                  )
   )   
 ,
 if( $(vCountryUSA) < 0,//else if
     if((ttl_received_qty <> 0) and ttl_shipped_qty = 0,1,//If no outflow txn, not pass (set = 1)    
    if(in_out = 'Shipped' and country = 'USA',
                  if(sum(Fund) <= $(vCountryUSA), 0,1)//pass threshold = 0
                  )                
   )
      ,0 )//if user input = 0
 )
,date, cust_id, in_out, country)

 

Create the same Flag for Vietnam(vCountryVietnamFlag )

vCountryFinalFlag

aggr(nodistinct {1}
if(
If(NumericCount($(vCountryVietnamFlag)), Sum($(vCountryVietnamFlag)), NULL()) = 0 
and If(NumericCount($(vCountryUSAFlag)), Sum($(vCountryUSAFlag)), NULL()) = 0 
                   ,0,1)
,date, cust_id)

Repeat the same for Type

vFinalFlag

Aggr(
 if(Sum($(vCountryFinalFlag)) > 0
  or Sum($(vTypeFinalFlag))> 0
    , 1,0)
,date, cust_id)

In the sheet, I put this expression, so it only shows the customer who meet the requirement

Sum({< cust_id = {"=sum($(vFinalFlag)) = 0"}>}Fund)

Thanks!

 

 

21 Replies
colinodonnel
Creator II
Creator II

Hello there, 

 

Aggr is useful but it can be resource heavy.

Some use them for Averages when a simple sum ()  / Count distinct () does the job fine.

I'd recommend pushing some of the If Statements back to load script and doing the calcs there, by creating boolean flags. 

e.g. ttl_received_qty= 0 and ttl_shipped_qty <> 0 

This could be replaced by a column in the data 

And then in the chart statement use Set Analysis  sum ({<NewFlag = {1}>} fund)  or even Sum (fund) * NewFlag 

(just make sure to test the outputs for either option for accuracy as the latter has failed me on the odd occasion)

sunny_talwar

One option is to create a key field using date, cust_id, in_out, country and use this new field in your set analysis.

LOAD date&'|'&cust_id&'|'&in_out&'|'&country as KeyField

and now use your condition within set analysis like this

Avg({<KeyField = {"=not (ConditionHere)"}>} 1)

You might have to play around with it, but I hope you get the idea. 

colinodonnel
Creator II
Creator II

Hi Sunny,

Could you explain how that works. 

a little lost by the advanced expression within the SA, actually most parts plus

Why use Avg (1)

Why use not

How would this be used within the AGGR

 

e.g.  KeyField has a field value as follows (based on  the first row in the data in the table)

20190816 & '|' & 1001 &'|'& Receive &'|' & Vietnam

What would "ConditionHere" look like.

Many thanks

sunny_talwar

Can you share an app where you have set this up with the dummy data and the variables to test out and show what I mean?

colinodonnel
Creator II
Creator II

Will have to leave that that to person who posted the question.

sunny_talwar

Sorry about that, I thought you were the original poster of this thread. That is completely my bad. I could have tried to explain this without the example... but this is not a simple expression to explain and that is why I asked for a sample.

colinodonnel
Creator II
Creator II

no worries 🙂

hopefully it will get posted at some stage to help my learnings too, thanks

CavellMok
Contributor II
Contributor II
Author

Thanks for all the replies! Sharing the app using sample data.

Sunny suggested to create a key field, is it create one for country  date + cust_id + in_out+ country and another for type date + cust_id + in_out+ type? I also tried to do it in set analysis. However, seems null can't compare.

Example:

User input

USA: -100

China:0

Transaction

Customer A USA -100

Customer B China -10

Expected Result

Show Customer A as it meets user input. Not show customer B because it doesn't meet user input.

Result from below set analysis

No customer showed. Because when look at customer A =Sum({<country = {'China'},in_out = {'Shipped'}>}qty is null . Please help 

Sum({<cust_id = {"=Sum({<country = {'China'},in_out = {'Shipped'}>}qty) <= $(vCountryChina)"}>
* <cust_id = {"=Sum({<country = {'USA'},in_out = {'Shipped'}>}qty) <= $(vCountryUSA) "} >} 
qty)

sunny_talwar

What is Customer A in your example? I mean your sample gives a different example, but I am not sure what is the output you need. Can you explain what is needed from the sample qvf attached based on few different inputs?