Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
T_Qlik
Contributor III
Contributor III

optimize way to Join between two group by statements

Hello Everyone,

Below code works for me on limited data set but on full load app throws Object out of memory.

Is there better/optimize way to tackle below code?

Granularity of data is at month level and Join condition between two tables is below Composite Key.

without left join..numbers from Table 'ABC' are not giving the desired results..

 

ABC:
LOAD
AutoNumber(Month & '|' & Cust_ID & '|' & Brand) as Key,
Month,
    Status,
    Brand ,
    TOTAL_CUSTOMERS
;
select 
distinct Cust_ID,
Month,
    Status,
    Brand ,
    Count(distinct Cust_ID) TOTAL_CUSTOMERS 
from table1
Group by 1,2,3,4; 
 
 
left join (ABC)
XYZ:
LOAD 
    AutoNumber(Month & '|' & Cust_ID & '|' & Brand) as Key,
    Outlet_FLAG,
    Brand AS Brand_XYZ,
    Expected_Cust   //need to calculate count of customer where expected cust= 'Yes'..Count(Dist Cust_id)
FROM XYZ.qvd(qvd);
Labels (2)
9 Replies
ali_hijazi
Partner - Master II
Partner - Master II

you can simply keep the tables as are
qlik will link them via Key
and you calculate the count on the UI

Count(Distict {<Expected_Cust={"yes"}>} Cust_id)

I can walk on water when it freezes
T_Qlik
Contributor III
Contributor III
Author

I know but in my case data volume is on higher side and app throws out of object memory error

T_Qlik
Contributor III
Contributor III
Author

Plus without left join count from table 'ABC' is not giving the desired O/P.

marcus_sommer

It's not clear for me - the load runs into a memory error or afterwards any UI object?

T_Qlik
Contributor III
Contributor III
Author

Load runs into memory error

marcus_sommer

The load with just a few fields and the group by inside of the sql looked rather small and I wouldn't expect memory issues unless the datasets are really huge and going into billions of records. Therefore how many records are there and how many free RAM is available? Further make sure that there are no further data within the data-model before you perform this joining.

T_Qlik
Contributor III
Contributor III
Author

Close to 1.3 billion records from table 'ABC'.

marcus_sommer

There are several approaches thinkable which may reduce the RAM consumption but the most obvious one would be to implement an incremental logic.

Personally I would tend to distribute such task to at least two layers and the first one would be the slicing of ABC in YYYYMM periods and storing them into qvd's. Similar would it be with the XYZ join-data and the join would then be applied within the next layer.

Beside this you may try to remove the autonumber() from the joins because I'm not sure at which work-step the autonumber() released the RAM. Also a storing of the entire ABC as qvd is an option and would mean that the autonumber() couldn't remain. Depending on the Cust_ID and Brand you might be able to create a numeric concatenation instead of a string one.

A further reducing might be the filtering of the join-table against the existing ABC key-values - maybe also in a workstep in beforehand and then loading it as mapping table and merging then the data per:

subfield(applymap())

instead of the join.

Before you goes in any direction it would be helpful just to load ABC and the looking how many RAM is consumed and of course further available to get a feeling if your environment is general suitable sized for your wanted data-set.

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

A couple of thoughts. 

AutoNumberHash128(Month, Cust_ID, Brand) as Key 

may use less memory than AutoNumber. 

You mentioned you were not getting correct results when kept as separate tables. That may be because you have extra rows that are remove by the Left Join. Try keeping as separate tables but with a where exists() to filter. 

XYZ:
LOAD 
    AutoNumberHash128(Month, Cust_ID, Brand) as Key,
    Outlet_FLAG,
    Brand AS Brand_XYZ,
    Expected_Cust   //need to calculate count of customer where expected cust= 'Yes'..Count(Dist Cust_id)
FROM XYZ.qvd(qvd)
Where Exists(Key, AutoNumberHash128(Month, Cust_ID, Brand) )
;
 
1.3 Billion rows can be challenging 🙂
 
-Rob