Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Community,
Can some one pls suggest on below performance, while gouping by it is taking lot of time.
Below is my code.
I need to have sum in transformation.
TEMP_AGGR:
LOAD CNTX_REPORTING_DATE&'|'&SCENARIO_NAME&'|'&SCENARIO_ID&'|'&CUSTOMER_FIELD1&'|'&HBIO_1201S&'|'&HBUS_3607S&'|'&
HMUS_3285S&'|'&HNAH_3460S&'|'&HSBH_1100S&'|'&HSI_32900&'|'&HTCD_37130&'|'&HUSI_3401S&'|'&TRANCHE_COLLATERAL_TYPE&'|'&GENERIC_FIELD9&'|'&
GUARANTEE_ATTRIBUTE_6&'|'&COLLATERAL_ATTRIBUTE_5&'|'&RWALineItemKey&'|'&Memoranda_LineItemKey&'|'&BSLineItemKey&'|'&%MEMORANDA_KEY as NEWKEY,
column1,column2,column3, etc ...(around 30 columns)
resident $(vTable) ;
NoConcatenate
TEMP_AGGR_1:
LOAD
sum(column1) as column1,sum(column3) as column2,sum(column3) as column3,etc ...(around 30 columns)
NEWKEY
resident TEMP_AGGR
group by NEWKEY;
drop table TEMP_AGGR;
NoConcatenate
$(vAggrTable):
LOAD
SubField(NEWKEY,'|',1) as CNTX_REPORTING_DATE,
SubField(NEWKEY,'|',2) as SCENARIO_NAME,
SubField(NEWKEY,'|',3) as SCENARIO_ID,
SubField(NEWKEY,'|',4) as CUSTOMER_FIELD1,
SubField(NEWKEY,'|',5) as HBIO_1201S,
SubField(NEWKEY,'|',6) as HBUS_3607S,
SubField(NEWKEY,'|',7) as HMUS_3285S,
SubField(NEWKEY,'|',8) as HNAH_3460S,
SubField(NEWKEY,'|',9) as HSBH_1100S,
SubField(NEWKEY,'|',0) as HSI_32900,
SubField(NEWKEY,'|',11) as HTCD_37130,
SubField(NEWKEY,'|',12) as HUSI_3401S,
SubField(NEWKEY,'|',13) as TRANCHE_COLLATERAL_TYPE,
SubField(NEWKEY,'|',14) as GENERIC_FIELD9,
SubField(NEWKEY,'|',15) as GUARANTEE_ATTRIBUTE_6,
SubField(NEWKEY,'|',16) as COLLATERAL_ATTRIBUTE_5,
SubField(NEWKEY,'|',17) as RWALineItemKey,
SubField(NEWKEY,'|',18) as Memoranda_LineItemKey,
SubField(NEWKEY,'|',19) as BSLineItemKey,
SubField(NEWKEY,'|',20) as %MEMORANDA_KEY,
column1,column2,column3, etc ...(around 30 columns)
resident TEMP_AGGR_1;
Concatenate
LOAD *
Resident $(vMemoTable);
drop table TEMP_AGGR_1;
Hi Priya R,
The only way I know that could speed up the group by queries is by using sub select queries. Try using the subquery to group the data you need.
Cheers,
Rohan
I'm not surprised that your script is quite slow - the aggregation of data within a load is one of the most resource-intensive actions which are possible and qlik couldn't perform this kind of task multi-threaded else it's single-threaded (I don't know if other tools could handle it more performant). Especially by aggregating 30 columns over a really huge key - I don't think that concatenating all the fields to this key and splitting them later again is helpful in any way.
My suggestion is to check if you this aggregation table is really needed especially by this amount of grouping fields will be the reduction-rate from not aggregated to aggregated data rather slow - I would assume maybe around 50%. So I could imagine that calculations over the unaggregated data aren't essentially slower as over the aggregated data.
If you really need aggregated data you should check if you could aggregate them with lesser grouping fields and/or split them into several horizontal and vertically parts which could be merged afterwards per joining/concatenating/mapping. If this isn't performant enough you will need incremental approaches which loads the aggregated data from a qvd and only the new/changed records will be aggregated on this way and append in the end to the qvd.
- Marcus