Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
rnorris
Partner - Creator
Partner - Creator

How to maintain Autonumberhash values across an iterative load?

Hi, much as the title suggests, I'm doing an iterative load, but something of a manual one. Essentially, I'm storing off 24 hours of data from a database table (which is 28 million rows per day). There is a field which says when the data was added to the database. 

Every ten minutes, I find the most recent timestamp from my big QVD, and then do a

select * from bigtable where data_added_timestamp > last_seen_timestamp, 

do some processing on this new lump, and then concatenate it with my big QVD, dropping anything that's now older than 24 hours. Doing it this way is crucial because the reload is very time sensitive, I can't be processing the full 24 hours each time.

For immutable reasons I have to create a key, which is a combination of 4 different columns, which ends up being very large. At the moment, this is fine, because I'm just doing this against test data, and only have a few hundred rows. But when we go into the millions, it's going to crunch, and so I'll need to apply an autonumber of some description against this key.

However, as I understand it, the autonumber has no way to match the autonumbering it did before, that would already be stored in the big historical qvd? So, each time I increment the big QVD, it will be with a bunch of arbitrary keys that might clash with what's gone before?

Is there any way to maintain the 'autonumbering'?   

2 Replies
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

You are correct that AutoNumber values are not deterministic across script runs.  AutoNumber values are not appropriate to use as identity in QVDs. 

As an alternative, consider using just Hash128(f1, f2,  f3, f4) for the keys.  Those would be 16 bytes each, larger than the autonumber integers but hopeful smaller than concatenating the four fields. 

There is a workaround to the autonumber identity limitation that involves "priming" the autonumber pump with existing values before autonumbering anything new.  I've only ever used this a few specialized cases.   Before autonumbering anything in the new batch, you need to recreate the existing autonumber sequence from the QVD like this:

TempKeys:
LOAD 
  AutoNumberHash128(f1, f2, f3, f4) as Key
from BigQvd.qvd (qvd
Order by Key
;

// Now autonumber the new batch

Replace AutoNumberHash128() with whatever statement you used to create the autonumbered key in the QVD.

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

 

marcus_sommer

I think I would tend to do it a bit differently and applying the autonumber() not within the qvd's else only within the qvw. This means within the qvd-creation you combines these 4 fields. This might be done rather classical with something like:

F1 & '|' & F2 & '|' & F3 & '|' & F4 as X

or you may in regard to Rob's suggestion use any of the hash###() functions and depending on the needed key-fields you might be also able to combine them as number. I do this by several occasions with a logic like:

ID1 * pow(10,  8 )+ ID2 * pow(10, 4) + ID3 + floor(ID4, 0.1)

Of course the number-parts mustn't be overlapping to get real unique values and the logic is limited to max. 15 digits by using it as key and if there are any further calculations needed it's reduced to max. 14 digits.

The mentioned hashing or the converting to a numeric value are not mandatory else just an additionally attempt to reduce the qvd-size and with it also the loading-times. Both aspects are probably not your biggest bottleneck.

More important is now that you don't need to add the new records to the big qvd else you might slicing the data into hourly qvd's. The final qvw loads then all slices and afterwards you apply:

AutoNumber ‒ QlikView

on your key-field and dropping it afterwards.

Further you may also consider not to load all qvd's into the final app else to implement an intermediate app with a fixed version of the data and your final app loads from there binary and only the newest qvd's are then loaded on top.

- Marcus