Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Prog
Contributor II
Contributor II

Need help optimising adding a new column from one table to another

Hello,

For the purpose of my question I'm keeping things generic.

I have two tables:

Table1 (600M rows):

Contains ID_A and ID_B

Table2 (20M rows):

Contains ID_B and ID_C

Both are stored as QVDs.

Now, I want to load data from Table1, and add ID_C from table2. I also want to then create a new column combining ID_A and ID_C

At the moment I am using a mapping load for Table2, and using applymap to both add ID_C, and to create the compound ID.

Map:

Mapping Load

ID_B,

ID_C

From Table2.QVD

 

Load

ID_A,

ID_B,

ApplyMap('Map', ID_B) as ID_C

ID_A&'-'&ApplyMap('Map', ID_B) as compound_ID

From Table1.QVD

The script is taking a while to run as it slowly goes through all 600M rows.

Is there any way of optimising this? Or do I have to accept that 600M rows will take some time?

 

 

Cheers

 

 

Labels (1)
2 Replies
marcus_sommer

The biggest potential to reduce the run-times is probably to enable an incremental approach for this task. Means you need to slice the qvd's in historical and current data maybe on the id's  or in some way in YYYYMM.qvd or anything similar and then only loading/mapping the new records/periods.

If this is really not possible you could try to minimize the dataset on which you apply the mapping. This means to run at first both qvd's against each other with a where exists(ID) clause. Afterwards the mapping happens only for the left records and the excluded records are added again with a where not exists(ID) logic. Of course such an approach adds an overhead and some complexity but the where exists() clause with a single parameter keeps the loading optimized and therefore it may - depending on your dataset - reduce the overall time.

Beside this take also a look on the available RAM - if Qlik is forced to move data between the RAM and the virtual RAM on the disc the performance decreased significantly. In this case you may add more RAM or you might be moving this task to another machine/cluster and/or within another time-frame. Further you may consider any slicing-approach without any incremental intention just to reduce the RAM footprint.

Theoretically it should be a benefit if you creates the compound-key within a preceding-load, like:

load *, ID_A&'-'& ID_B as compound_ID;
Load ID_A, ID_B, ApplyMap('Map', ID_B) as ID_C
From Table1.QVD

but I don't think that it will be relevant in the end.

- Marcus

Prog
Contributor II
Contributor II
Author

Hi Marcus,

Thank you very much for your in-depth response.

Unfortunately incremental loads or adding more RAM are not options in our environment right now.

I tried the preceding load but yeah it did not make any noticeable difference.

I'll look at applying the map to only the relevant rows and see how that goes.

 

Cheers