Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Incremental Loading and Autonumberhash256 for a very large application

Hello,

I have a problem - a BIG one.

I have a  dashboard which is integrating a lot of disparate sources and  the link table I have has 6 keys. Each of them is a composite key and I had used Autonumberhash256 to minimize the size.

Now the problem is

1. we need to incrementally load

2. The link table is over 120 mn records.

I am aware now that autonumberhash256 will not work during incremental loading after I saw hash collisions in the key. So same key generated for two different periods where the original values are different.

I thought of using the hash256 in the qvd generator level for incremental loading and using autonumberhash256 in the model. But I am not too sure it will perform well.

I do not know of a way to keep the autnumbering persistent across different runs (incremental).

I cant move this to ETL tools so have to resolve this is QV.

Any bright ideas?

11 Replies
Not applicable
Author

Hello, either autonumber or autonumber hash functions will returns same key values for different values on different reload instances. So you can't create Key number in QVD Loader. But you can create while consuming the QVD into application Layer.

So, while creating the QVD, add the composite Key without AutoNumber function.

Ex:

SELECT

    A,

    B,

    C,

    D,

    E,

    F

    A&'|'&B&'|'&C&'|'&D AS %PrimaryKey

From Table

Where TimePeriodFileter;

With this approach QVD size may increase the size, but its not a issue.

Method1:

While consuming QVD , add the auto number function.


LOAD

    A,

    B,

    C,

    D,

    E,

    F

    AutoNumber(%PrimaryKey) AS %AutoFactKey

From QVD_NAME.qvd;


Down side is QVD is not optimized load


Method2:

Use resident table to create the Auto Number Key& do left join to Main table.


Fact:

LOAD

    A,

    B,

    C,

    D,

    E,

    F

    %PrimaryKey

From QVD_NAME.qvd (qvd);


Left Join(Fact)

Load distinct

     %PrimaryKey,

     AutoNumber(%PrimaryKey) as %AutoFactKey

Resident Fact;


Down side is you have to use the Resident Load.


Try above two methods and measure reload times and use best method works for you.

Not applicable
Author

Thanks for the response - but I have more than one key to take care of. So will have to see which works better for this case.

Gysbert_Wassenaar

I do not know of a way to keep the autnumbering persistent across different runs (incremental).

That's because that can't be done reliably with the autonumber* functions. You will have to use the Hash* functions which will work to create unique values across different runs. You can then use the autonumber function in the final load (where you put everything together for the dashboard workbooks) to replace the unique hash keys with smaller numeric autonumber values.


talk is cheap, supply exceeds demand
Not applicable
Author

Thanks Gysbert - that's what I am scripting right now.

But overall will that not slow down the load times given the fact I will be loading all of two years data in the model and generate autonumber hash - that's my worry

Gysbert_Wassenaar

If you do incremental loading then you will have to do it only once for the historic data. You will add the hashed key to the historic qvd's so you won't have to recalculate them every time. You'll only have to calculate the hash keys for the new data. I'm afraid that turning the hash keys into autonumber values does mean that you won't have an optimized load from the qvd.


talk is cheap, supply exceeds demand
marcus_sommer

An autonumberhash will require a lot of ressources and therefore it could be better to use composite keys like in the example from dathu.qv. An alternatively might be to create a numeric key (by using only id-fields for it) like this one:

([StoreID] * pow(10, 4)) + ([OrderID] * pow(10, 9)) + [OrderLineID] as [UniqueKey]

which worked great in my case.

Further I would consider to change the whole approach of the datamodel then to use such a large link-table with so many keys might not the most performant way to handle a large dataset.

- Marcus

Not applicable
Author

Great idea Marcus

Also, why not do the same for OrderLineID ?

I  will need to check if every field in the composite key is a number though.

Just thinking aloud - If I use hash(string field) it should always return the same hash value I guess and then I can use the above trick. ?

marcus_sommer

I don't understand exactly what you mean with:

Also, why not do the same for OrderLineID ?

Maybe it's not the same like in your environment but by us it's a max. 3 digit number.

Regarding to your hash-approach I would try to avoid it under nearly all circumstances then it creates a (long) string which meant many disc-space and RAM will be needed for it.

- Marcus

Not applicable
Author

‌I mean why no OrderLineId * pow(10,9).