Autonumberhash versus applymap - Qlik Community

Report Inappropriate Content · ‎2013-01-30

Hi community

We have a performance issue in our Production environment and have two different options to use, and need to know which one could be best in terms of CPU / memory usage:

1. Table with more than 40 billion CDRs and around 0.5 billion unique MSISDNs. This table contains around 30 fields, amongst which is this MSISDN field.

2. Each unique MSISDN need to have a calculated field (ID) consisting in a unique ID assigned (e.g. MSISDN +447892128902 => ID 1, MSISDN +442346566724 => ID 2, and go on)

3. We see two options to obtain this calculated field:

Option 1: apply a AutoNumberHash to each MSISDN while loading the 40 billion CDRs.

Option 2: create a mapping table with all distinct MSISDNs + a RowNo() field first. When loading the 40 billion CDRs, apply a map to MSISDN to get the resulting RowNo.

Does anyone have a clue on what could use less CPU / Memory resources, a Autonumberhash for 40 billion records, or a lookup to a 0.5 billion records mapping table.

Thanks in advance!

Jason_Michaelides · ‎2013-01-30

Difficult to say without looking at the script. My suspicion would be ApplyMap() would be considerably faster in the final load but of course you have to take into account the step to generate the map.

In terms of CPU/RAM usage in the final application, they won't be any different as they are both using an integer key. The difference between the method will be in the load times.

Hope this helps,

Jason

Report Inappropriate Content · ‎2013-01-30

I forgot to say... the performance issue is only in the QV Publisher Server side, relative to the loading process.

Prod server is 16 core CPU and 192 GB, and in the past we found that when the load process uses all the RAM available, the loading process fails.

QlikView version is v9.

Theorically, Option 1 (AutoNumberHash) could be more CPU consuming, while Option 2 (ApplyMap) could be more RAM consuming, assuming that all the Mapping table with up to 0.5 billion records needs to be loaded in memory.

Our test server is quite limited and the dummy files we got are not big enough to get confirmation of the above though.

Jason_Michaelides · ‎2013-01-30

Can't fault your logic so I'd suggest 2 things:

1. Upgrade to v11 asap. In v10 the load process started to utilise multi-threading so the CPU usage will most likely be spread out over multiple cores and could significantly improve things for you.

2. Test on the production servers out of hours! Only way if you don't have similar kit...

Hope this helps,

Jason

Report Inappropriate Content · ‎2013-01-30

We wish it were so simple but the client does not want to upgrade to QV10/11 for budget reasons, and we don't have access to Production at all. So our only option is do our best guess on what will have a minor impact on performance.

Jason_Michaelides · ‎2013-01-30

Then it is indeed a case of trial and measure! Personally, I love ApplyMap() - or even better, MAP USING (if possible). Other options are a LEFT JOIN with field dropping and renaming. For large data sets I have found this works extremely quickly.

rwunderlich · ‎2013-01-30

Most importantly, you dont need autonumberHash() just the autonumber() function. No need to incur the extra CPU of doing the Hash.

-Rob

Jason_Michaelides · ‎2013-01-31

Missed that one Rob - cheers!