Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
CavellMok
Contributor II
Contributor II

Aggr() Optimization on Large Dataset with User Input

Hi everyone,

I'm frustrated in this case because after 1 week hard work of building the dashboard, turns out the performance is unacceptable. I'm sure there are a smarter way to do it and I can't figure it out alone. Open to any suggestions as long as it works with huge dataset.

Background:

Main purpose of the dashboard is to allow user to quickly identify bulk shipment (received or shipped) to designated countries/for designated type. User can input quantity that they want to check(highlighted in yellow in above picture) for specific country/type.

2 datasets - Customer, Transaction

Sample Data.PNG

 

If user input 200 in USA quantity, it will only show the customer who received >= 200 on that day(group by date,cust_id). In this case, customer 1001 will be filter out because received from USA = 0 which is less than 200. If user input -10 in USA quantity, 1001 will be shown because total shipped is 149 which pass 10.

I already developed a solution but it doesn't work with large volume data.

My dataset volume: Customer-3M rows, Transaction-3M rows

My Solution(doesn't work with huge dataset)

vCountryUSA : USA quantity input by user (default is 0)

vCountryUSAFlag

aggr(nodistinct {1}
 if( $(vCountryUSA) > 0,//User input value greater than 0
     if(ttl_received_qty= 0 and ttl_shipped_qty <> 0,1,//If no inflow txn, not pass (set = 1) 
    if(in_out = 'Received' and country = 'USA',
                  if(sum(Fund) >= $(vCountryUSA), 0,1)//pass threshold = 0
                  )
   )   
 ,
 if( $(vCountryUSA) < 0,//else if
     if((ttl_received_qty <> 0) and ttl_shipped_qty = 0,1,//If no outflow txn, not pass (set = 1)    
    if(in_out = 'Shipped' and country = 'USA',
                  if(sum(Fund) <= $(vCountryUSA), 0,1)//pass threshold = 0
                  )                
   )
      ,0 )//if user input = 0
 )
,date, cust_id, in_out, country)

 

Create the same Flag for Vietnam(vCountryVietnamFlag )

vCountryFinalFlag

aggr(nodistinct {1}
if(
If(NumericCount($(vCountryVietnamFlag)), Sum($(vCountryVietnamFlag)), NULL()) = 0 
and If(NumericCount($(vCountryUSAFlag)), Sum($(vCountryUSAFlag)), NULL()) = 0 
                   ,0,1)
,date, cust_id)

Repeat the same for Type

vFinalFlag

Aggr(
 if(Sum($(vCountryFinalFlag)) > 0
  or Sum($(vTypeFinalFlag))> 0
    , 1,0)
,date, cust_id)

In the sheet, I put this expression, so it only shows the customer who meet the requirement

Sum({< cust_id = {"=sum($(vFinalFlag)) = 0"}>}Fund)

Thanks!

 

 

21 Replies
CavellMok
Contributor II
Contributor II
Author

Attaching qvf with updated aggr().

In the sheet, all three quantity (received quantity, shipped quantity, net quantity) and the net quantity in the table applied [transaction.cust_id] = {"=sum($(vFinalFlag)) = 0"}>". vFinalFlag is a variable derived from vCountryFinalFlag and vTypeFinalFlag. While vCountryFinalFlag is derived from vCountryChina, vCountryUSA and vCountryVietnam.

marcus_sommer

I don't have Sense available and couldn't look into your application but if I look again on your descriptions and your screenshot I think again that your whole approach is rather not suitable.

It starts in each case with the datamodel. Your multi-field KEY prevents synthetic keys but I doubt that's really useful. I assume that customer isn't a dimension-table (such key doesn't make sense in this case) else a fact-table. Is it a fact-table it needs to be merged with the transaction-table (it might not be easy but necessary).

Other things which be useful in the script could be to fill NULL with zero and/or to create missing records. Further to transform in_out into one/several numeric flag-fields which could be also applied as multiplicator (in general are numeric comparings faster as to compare strings.

In regard to your aggr it's very bad that cust_id didn't belonged to the transaction-table. Further there surely ways to reduce the efforts within the aggr to not check on >= 0 and similar else to apply some range-function, alt(), using fabs() and applying the -+ laterly, moving the multiple calculations within the check from nested if-loops tp pick(match()). I think there is much improvement possible - especially if you rethink the usability for you and the users to avoid all the variables and apply instead real field-associations and selections. Variables have the aim to simplify things - in your case it seems rather the opposite.

It's quite probably not want do you want to hear but solving problems within the script instead of the UI is often much easier (and sometimes the only way).

- Marcus