Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have stumbled across the following phenomenon.
I read data from a file and use the autonumberhash function, because I have to form a key over several fields as in this example:
Tab1:
LOAD
autonumberhash256( Field1, Field2, Field3 ) AS KeyField,
Field1,
Field2,
Field3
FROM MyFile;
In the second step, I read the fields again with a LOAD ... RESIDENT again to read the fields and generate the key:
Tab2:
LOAD
autonumberhash256( Field1, Field2, Field3 ) AS KeyField
RESIDENT Tab1;
My problem is that I now get two different hash values for the same data.
But if I embed the fields in a trim( ... ) function, I get the same key.
Tab2:
LOAD
autonumberhash256( trim( Field1 ), trim( Field2 ), trim( Field3 ) ) AS KeyField
RESIDENT Tab1;
I cannot see any difference between the original fields and the imported fields. Reading in should not change the data either.
any idea why this behavior occurs?
Best regards
Christian
Hello, you should check whether your keys are really the same on both sides by using field concatenation.
I would also advise you not to use the autonumberhash function but rather either autonumber (if your application is a final app) or hash() if your application is a qvd generator.
Have a look at the answer on this subject.
Best regards,
That's strange. What Qlik Sense version are you using? Cloud or what version?
Is there a difference if you create keys using hash256() instead of autonumberhash?
Hi,
many thanks for the quick answers.
I will try to clarify a few things:
- I use qlik sense in the cloud, so I'm probably using the latest version.
- I checked and checked various things before writing here. In fact, I noticed it because a LEFT JOIN via this key did not work.
- Actually, I think that one function should work. I don't like to hear: There is this function, but you should use another one.
- I know the autonumber function, but I need a key made up of several fields. From my point of view, it is more professional to pass the fields as parameters separated by commas than to pass them with '... & '|' & ...' to link them.
- only the hash..(..) function returns the same result, without the trim function, I get different values
Best regards
Christian
"- I know the autonumber function, but I need a key made up of several fields. From my point of view, it is more professional to pass the fields as parameters separated by commas than to pass them with '... & '|' & ...' to link them."
More ram consumption, more spaces strorage so less optimized but as you want ...
Did you try to juste create the key without function ? Just to see if it works.
Regards
@VBD I'm not sure I agree with you. There could possibly be more ram usage, and even cpu usage, during reload - but in the finished data model the result is the same. Autonumberhash results in a serialnumber, as well as autonumber() does. One advantage with autonumber though, is that it's possible to add a reference (AutoID) so that you can have multiple autonumbered columns independently of one another. And the advantage with the hash functions is that you can use separated field with commas (field1, field2), you don't need to concatenate them manually (like field1&'|&field2).
@Ch_Nwk I do agree with VBD that you should try creating the key without hashing/autonumbering to actually see what's going on. So instead of autonumberhash256( Field1, Field2, Field3 ) AS KeyField you do this: Field1&'_'&Field2&'_'&Field2 as KeyField so that you can look at the result without it being "hidden".
Both autonumberhash256() loads aren't completely the same. The first one takes the field-values directly from the external source and the second one from internal stored values. Depending on the source-type + the load-order + various settings (interpretation/system-variables) may the data-interpretaion be different which may result in (slightly) different field-values.
Beside the already made suggestion of creating another key manually like: F1 & '|' & F2 ... I would apply a separat recno() and rowno() to each load and also adding something like: len(F1) as L1, ...
With it you could directly compare on a record-level what's going on and having a starting point for further investigations if anything isn't like expected.
As @marcus_sommer pointed out, the fields are not exactly the same. As you've posted pseudo code, i can't say exactly why they may be different in this case. However, I suggest that AutoNumberHash*() is an old school method that I would not use any more. Instead of using an AutoNumber() function to create keys, use the AutoNumber Statement instead.
Tab1:
LOAD
Hash128( Field1, Field2, Field3 ) AS KeyField,
Field1,
Field2,
Field3
FROM MyFile;
Autonumber KeyField;
I use Hash128() instead of Hash256() as the likelihood of a hash collision is already incredibly low with 128.
If you're wondering if this technique is performant, yes, it's actually the best in my testing.
-Rob
Hello,
and many thanks for the detailed response.
"... the fields are not exactly the same...",
I think so too, but that's exactly my problem.
After a LOAD ... RESIDENT the content of the fields doesn't seem to be the same anymore. That would mean LOAD RESIDENT changes the fields.
In this case I can work around it but I also use the autonumberhash*( ) function in another place and therefore this behavior makes me uneasy.
@rob
I need to compare tables and apply only the changed records. Since some of the tables have over 200 columns and it doesn't matter which one has changed, I thought I'd make a hash of all the columns and compare the hash values.
Here I could also use the hash128(...) function, but then I would have to do a string comparison.
the autonumberhash*(...) function seems to be slower, but then I have an integer comparison, which should be faster than the string comparison.
What do you think about this?
Best regards
Christian
I would be surprised if you would notice the difference between a string and integer comparison in a script run. But you do notice the difference when using AutoNumberHash().
"That would mean LOAD RESIDENT changes the fields." I don't think that's what Marcus was saying. What I understood was that one was using the input data from myfile and the other was using the output values from Tab1.
Hard to say why there may be a difference without seeing your entire script.
-Rob