Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a problem - a BIG one.
I have a dashboard which is integrating a lot of disparate sources and the link table I have has 6 keys. Each of them is a composite key and I had used Autonumberhash256 to minimize the size.
Now the problem is
1. we need to incrementally load
2. The link table is over 120 mn records.
I am aware now that autonumberhash256 will not work during incremental loading after I saw hash collisions in the key. So same key generated for two different periods where the original values are different.
I thought of using the hash256 in the qvd generator level for incremental loading and using autonumberhash256 in the model. But I am not too sure it will perform well.
I do not know of a way to keep the autnumbering persistent across different runs (incremental).
I cant move this to ETL tools so have to resolve this is QV.
Any bright ideas?
Hello, either autonumber or autonumber hash functions will returns same key values for different values on different reload instances. So you can't create Key number in QVD Loader. But you can create while consuming the QVD into application Layer.
So, while creating the QVD, add the composite Key without AutoNumber function.
Ex:
SELECT
A,
B,
C,
D,
E,
F
A&'|'&B&'|'&C&'|'&D AS %PrimaryKey
From Table
Where TimePeriodFileter;
With this approach QVD size may increase the size, but its not a issue.
Method1:
While consuming QVD , add the auto number function.
LOAD
A,
B,
C,
D,
E,
F
AutoNumber(%PrimaryKey) AS %AutoFactKey
From QVD_NAME.qvd;
Down side is QVD is not optimized load
Method2:
Use resident table to create the Auto Number Key& do left join to Main table.
Fact:
LOAD
A,
B,
C,
D,
E,
F
%PrimaryKey
From QVD_NAME.qvd (qvd);
Left Join(Fact)
Load distinct
%PrimaryKey,
AutoNumber(%PrimaryKey) as %AutoFactKey
Resident Fact;
Down side is you have to use the Resident Load.
Try above two methods and measure reload times and use best method works for you.
Thanks for the response - but I have more than one key to take care of. So will have to see which works better for this case.
I do not know of a way to keep the autnumbering persistent across different runs (incremental).
That's because that can't be done reliably with the autonumber* functions. You will have to use the Hash* functions which will work to create unique values across different runs. You can then use the autonumber function in the final load (where you put everything together for the dashboard workbooks) to replace the unique hash keys with smaller numeric autonumber values.
Thanks Gysbert - that's what I am scripting right now.
But overall will that not slow down the load times given the fact I will be loading all of two years data in the model and generate autonumber hash - that's my worry
If you do incremental loading then you will have to do it only once for the historic data. You will add the hashed key to the historic qvd's so you won't have to recalculate them every time. You'll only have to calculate the hash keys for the new data. I'm afraid that turning the hash keys into autonumber values does mean that you won't have an optimized load from the qvd.
An autonumberhash will require a lot of ressources and therefore it could be better to use composite keys like in the example from dathu.qv. An alternatively might be to create a numeric key (by using only id-fields for it) like this one:
([StoreID] * pow(10, 4)) + ([OrderID] * pow(10, 9)) + [OrderLineID] as [UniqueKey]
which worked great in my case.
Further I would consider to change the whole approach of the datamodel then to use such a large link-table with so many keys might not the most performant way to handle a large dataset.
- Marcus
Great idea Marcus
Also, why not do the same for OrderLineID ?
I will need to check if every field in the composite key is a number though.
Just thinking aloud - If I use hash(string field) it should always return the same hash value I guess and then I can use the above trick. ?
I don't understand exactly what you mean with:
Also, why not do the same for OrderLineID ?
Maybe it's not the same like in your environment but by us it's a max. 3 digit number.
Regarding to your hash-approach I would try to avoid it under nearly all circumstances then it creates a (long) string which meant many disc-space and RAM will be needed for it.
- Marcus
I mean why no OrderLineId * pow(10,9).