Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Autonumberhash versus applymap

Hi community

We have a performance issue in our Production environment and have two different options to use, and need to know which one could be best in terms of CPU / memory usage:

1. Table with more than 40 billion CDRs and around 0.5 billion unique MSISDNs. This table contains around 30 fields, amongst which is this MSISDN field.

2. Each unique MSISDN need to have a calculated field (ID) consisting in a unique ID assigned (e.g. MSISDN +447892128902 => ID 1, MSISDN +442346566724 => ID 2, and go on)

3. We see two options to obtain this calculated field:

     Option 1: apply a AutoNumberHash to each MSISDN while loading the 40 billion CDRs.

     Option 2: create a mapping table with all distinct MSISDNs + a RowNo() field first. When loading the 40 billion CDRs, apply a map to MSISDN to get the resulting RowNo.

Does anyone have a clue on what could use less CPU / Memory resources, a Autonumberhash for 40 billion records, or a lookup to a 0.5 billion records mapping table.

Thanks in advance!

7 Replies
Jason_Michaelides
Luminary Alumni
Luminary Alumni

Difficult to say without looking at the script.  My suspicion would be ApplyMap() would be considerably faster in the final load but of course you have to take into account the step to generate the map.

In terms of CPU/RAM usage in the final application, they won't be any different as they are both using an integer key. The difference between the method will be in the load times.

Hope this helps,

Jason

Not applicable
Author

I forgot to say... the performance issue is only in the QV Publisher Server side, relative to the loading process.

Prod server is 16 core CPU and 192 GB, and in the past we found that when the load process uses all the RAM available, the loading process fails.

QlikView version is v9.

Theorically, Option 1 (AutoNumberHash) could be more CPU consuming, while Option 2 (ApplyMap) could be more RAM consuming, assuming that all the Mapping table with up to 0.5 billion records needs to be loaded in memory.

Our test server is quite limited and the dummy files we got are not big enough to get confirmation of the above though.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

Can't fault your logic so I'd suggest 2 things:

1. Upgrade to v11 asap. In v10 the load process started to utilise multi-threading so the CPU usage will most likely be spread out over multiple cores and could significantly improve things for you.

2. Test on the production servers out of hours!  Only way if you don't have similar kit...

Hope this helps,

Jason

Not applicable
Author

We wish it were so simple but the client does not want to upgrade to QV10/11 for budget reasons, and we don't have access to Production at all. So our only option is do our best guess on what will have a minor impact on performance.

Jason_Michaelides
Luminary Alumni
Luminary Alumni

Then it is indeed a case of trial and measure!  Personally, I love ApplyMap() - or even better, MAP USING (if possible).  Other options are a LEFT JOIN with field dropping and renaming. For large data sets I have found this works extremely quickly.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Most importantly, you dont need autonumberHash() just the autonumber() function. No need to incur the extra CPU of doing the Hash.

-Rob

Jason_Michaelides
Luminary Alumni
Luminary Alumni

Missed that one Rob - cheers!