Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Ch_Nwk
Contributor II
Contributor II

autonumberhash... - function combined with LOAD ... RESIDENT

Hi,

I have stumbled across the following phenomenon.
I read data from a file and use the autonumberhash function, because I have to form a key over several fields as in this example:

Tab1:
LOAD
           autonumberhash256( Field1, Field2, Field3 ) AS KeyField,
           Field1,
           Field2,
           Field3
FROM MyFile;

In the second step, I read the fields again with a LOAD ... RESIDENT again to read the fields and generate the key:

Tab2:
LOAD
           autonumberhash256( Field1, Field2, Field3 ) AS KeyField
RESIDENT Tab1;

My problem is that I now get two different hash values for the same data.

But if I embed the fields in a trim( ... ) function, I get the same key.

Tab2:
LOAD
           autonumberhash256( trim( Field1 ), trim( Field2 ), trim( Field3 ) ) AS KeyField
RESIDENT Tab1;

I cannot see any difference between the original fields and the imported fields. Reading in should not change the data either.

any idea why this behavior occurs?

Best regards

 

Christian

Labels (2)
10 Replies
VBD
Partner - Creator II
Partner - Creator II

Hello, you should check whether your keys are really the same on both sides by using field concatenation.
I would also advise you not to use the autonumberhash function but rather either autonumber (if your application is a final app) or hash() if your application is a qvd generator.
Have a look at the answer on this subject.
Best regards,

Valentin Billaud
Next Decision
henrikalmen
Specialist II
Specialist II

That's strange. What Qlik Sense version are you using? Cloud or what version?

Is there a difference if you create keys using hash256() instead of autonumberhash?

Ch_Nwk
Contributor II
Contributor II
Author

Hi,

 

many thanks for the quick answers.
I will try to clarify a few things:

- I use qlik sense in the cloud, so I'm probably using the latest version.

- I checked and checked various things before writing here. In fact, I noticed it because a LEFT JOIN via this key did not work.

- Actually, I think that one function should work. I don't like to hear: There is this function, but you should use another one.

- I know the autonumber function, but I need a key made up of several fields. From my point of view, it is more professional to pass the fields as parameters separated by commas than to pass them with '... & '|' & ...' to link them.

- only the hash..(..) function returns the same result, without the trim function, I get different values 

Best regards

 

Christian

 
VBD
Partner - Creator II
Partner - Creator II

"- I know the autonumber function, but I need a key made up of several fields. From my point of view, it is more professional to pass the fields as parameters separated by commas than to pass them with '... & '|' & ...' to link them."

More ram consumption, more spaces strorage so less optimized but as you want ...


Did you try to juste create the key without function ? Just to see if it works.

Regards

Valentin Billaud
Next Decision
henrikalmen
Specialist II
Specialist II

@VBD I'm not sure I agree with you. There could possibly be more ram usage, and even cpu usage, during reload - but in the finished data model the result is the same. Autonumberhash results in a serialnumber, as well as autonumber() does. One advantage with autonumber though, is that it's possible to add a reference (AutoID) so that you can have multiple autonumbered columns independently of one another. And the advantage with the hash functions is that you can use separated field with commas (field1, field2), you don't need to concatenate them manually (like field1&'|&field2).

@Ch_Nwk I do agree with VBD that you should try creating the key without hashing/autonumbering to actually see what's going on. So instead of autonumberhash256( Field1, Field2, Field3 ) AS KeyField you do this: Field1&'_'&Field2&'_'&Field2 as KeyField so that you can look at the result without it being "hidden".

marcus_sommer

Both autonumberhash256() loads aren't completely the same. The first one takes the field-values directly from the external source and the second one from internal stored values. Depending on the source-type + the load-order + various settings (interpretation/system-variables) may the data-interpretaion be different which may result in (slightly) different field-values.

Beside the already made suggestion of creating another key manually like: F1 & '|' & F2 ... I would apply a separat recno() and rowno() to each load and also adding something like: len(F1) as L1, ...

With it you could directly compare on a record-level what's going on and having a starting point for further investigations if anything isn't like expected.

 
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

As @marcus_sommer pointed out, the fields are not exactly the same. As you've posted pseudo code, i can't say exactly why they may be different in this case. However, I suggest that AutoNumberHash*() is an old school method that I would not use any more. Instead of using an AutoNumber() function to create keys, use the  AutoNumber Statement  instead.

Tab1:
LOAD
           Hash128( Field1, Field2, Field3 ) AS KeyField,
           Field1,
           Field2,
           Field3
FROM MyFile;

Autonumber KeyField;

I use Hash128() instead of Hash256() as the likelihood of a hash collision is already incredibly low with 128. 

If you're wondering if this technique is performant, yes, it's actually the best in my testing. 

rwunderlich_0-1745594309662.png

 

-Rob

Ch_Nwk
Contributor II
Contributor II
Author

Hello,

 

and many thanks for the detailed response.

"... the fields are not exactly the same...",

I think so too, but that's exactly my problem.
After a LOAD ... RESIDENT the content of the fields doesn't seem to be the same anymore. That would mean LOAD RESIDENT changes the fields.

 

In this case I can work around it but I also use the autonumberhash*( ) function in another place and therefore this behavior makes me uneasy.


@rob
I need to compare tables and apply only the changed records. Since some of the tables have over 200 columns and it doesn't matter which one has changed, I thought I'd make a hash of all the columns and compare the hash values.

Here I could also use the hash128(...) function, but then I would have to do a string comparison.
the autonumberhash*(...) function seems to be slower, but then I have an integer comparison, which should be faster than the string comparison.


What do you think about this?

Best regards

 

Christian

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I would be surprised if you would notice the difference between a string and integer comparison in a script run. But you do notice the difference when using AutoNumberHash(). 

"That would mean LOAD RESIDENT changes the fields." I don't think that's what Marcus was saying. What I understood was that one was using the input data from myfile and the other was using the output values from Tab1.

Hard to say why there may be a difference without seeing your entire script.

-Rob