Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Ch_Nwk
Contributor II
Contributor II

autonumberhash... - function combined with LOAD ... RESIDENT

Hi,

I have stumbled across the following phenomenon.
I read data from a file and use the autonumberhash function, because I have to form a key over several fields as in this example:

Tab1:
LOAD
           autonumberhash256( Field1, Field2, Field3 ) AS KeyField,
           Field1,
           Field2,
           Field3
FROM MyFile;

In the second step, I read the fields again with a LOAD ... RESIDENT again to read the fields and generate the key:

Tab2:
LOAD
           autonumberhash256( Field1, Field2, Field3 ) AS KeyField
RESIDENT Tab1;

My problem is that I now get two different hash values for the same data.

But if I embed the fields in a trim( ... ) function, I get the same key.

Tab2:
LOAD
           autonumberhash256( trim( Field1 ), trim( Field2 ), trim( Field3 ) ) AS KeyField
RESIDENT Tab1;

I cannot see any difference between the original fields and the imported fields. Reading in should not change the data either.

any idea why this behavior occurs?

Best regards

 

Christian

Labels (2)
10 Replies
marcus_sommer

I never did this kind of comparing tables and therefore I'm not sure that this approach is the most suitable one. Here a few thoughts ...

At first the optimizing attempt with an autonumber() which means that both table-runs (old + new) have to be within the same script. This prevents nearly all possibilities to implement an incremental logic and loading the elder version prepared and optimized from a qvd.

The next goes to the join-logic to detect any changes within a record because joins are rather heavy measurements and the occurring matches / no matches needs to be evaluated within n following steps (adding the changed/new ones and removing/flagging the outdated ones).

Alternatives to the joins may be the use of mappings or probably much better to check the hashed values against each other per exists().

Another consideration (surely not suitable for all scenarios but certain ones may benefit) might be to transform the tables before the checks into a stream-data structure with a crosstable-statement, for example:

crosstable(Field, Value, 3)
load recno() as RecNo, rowno() as RowNo, * from Source;

Crosstable-statements are also quite heavy transformations and it removes by default NULL's as values which impacts the performances and the number of the returning rows per record as well as the possible methods to detect changes.

This means that depending of the table-sizes/structures and further requirements the implementation of n different check-logic might be result in the best performance (and then of course coming with disadvantages in regard to the complexity and developing/maintaining efforts).