Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
anu
Partner - Contributor III
Partner - Contributor III

autonumberhash256 function in qliksense

Hi,

I came across an existing dashboard with some composite keys at load script. To reduce the size and improve performance I thought to use autonumberhash256. I read in some blogpost that it has chances of collision.App is having 3 years of data with 35 million records.

I tried understanding hash function but I didn't got it.

Count(distinct keyname) of these keys are being used in calculations of some measures.

All keys are present in same table.

1) autonumberhash256(driver_name &'-'& "driver_id") as [Driver Name Key]

2) autonumberhash256(driver_name&vehicle_number&event_type&event_start_date_time&event_end_date_time)
as [Event Unique Key]

3)autonumber(country_name&'-'& pco &'-'&cluster_name &'-'& plant_code&'-'&month_year &'-'&driver_name
&'-'& transporter_name&'-'& plant_name &'-'& business_segment) as action_Key

Can I use autonumber or autonumberhash256 functions without any issues .
Please provide some suggestions on how i can tackle this thing.

@sunny_talwar @jagan , @marcus_sommer 

Labels (1)
4 Solutions

Accepted Solutions
rubenmarin

Hi, I don't have the numbers but the probability of colision should be very low, the main advantadge of the hash version is that it's faster, but only if you use commas to separate the fields:

 autonumberhash256(driver_name,"driver_id")

Because the costly operation is the concatenation of strings.

I usually use the autonumber, but as a statement, not as a funtion.

Statement: https://help.qlik.com/en-US/sense/November2022/Subsystems/Hub/Content/Sense_Hub/Scripting/ScriptRegu...

Function: https://help.qlik.com/en-US/sense/November2022/Subsystems/Hub/Content/Sense_Hub/Scripting/CounterFun...

 

View solution in original post

marcus_sommer

There is an excellent blog-posting about the risks of a hash-collision from Barry Harmsen which seems unfortunately not available for the public anymore: www.qlikfix.com/2014/03/11/hash-functions-collisions/ Like Ruben already mentioned is the probability of a collision very low and could be neglected for the most scenarios.

Beside this has a hash no benefits in regard to the performance because it creates large strings which need more resources as numbers. Therefore just using autonumber as statement like hinted would be fine and working well.

But I suggest to consider a change of the data-model because such composite key means that you want to link fact-tables (directly or per link-table) together. Very often this could be avoided by merging all facts per joins/mappings and/or concatenate into a single table and creating a star-scheme data-model which is also the official recommended data-model.

- Marcus

View solution in original post

marcus_sommer

I think there is no general rule if a key-field is used as dimension or another field. It depends always on the data, the data-quality and the view-requirements which dimensions and appropriate developed expressions are expedient or not.

View solution in original post

rubenmarin

Hi, if discrepancy is something that worries you, use the autonumber statement.

And the count would be like any other key, it depends of the use, maybe just a count(AutoKey) returns an higher value than the facts because there could be a key in the dimensions table that doesn't have any fact, usually with just a selection or adding a filter to set analysis this kind of error is avoided.

View solution in original post

6 Replies
rubenmarin

Hi, I don't have the numbers but the probability of colision should be very low, the main advantadge of the hash version is that it's faster, but only if you use commas to separate the fields:

 autonumberhash256(driver_name,"driver_id")

Because the costly operation is the concatenation of strings.

I usually use the autonumber, but as a statement, not as a funtion.

Statement: https://help.qlik.com/en-US/sense/November2022/Subsystems/Hub/Content/Sense_Hub/Scripting/ScriptRegu...

Function: https://help.qlik.com/en-US/sense/November2022/Subsystems/Hub/Content/Sense_Hub/Scripting/CounterFun...

 

marcus_sommer

There is an excellent blog-posting about the risks of a hash-collision from Barry Harmsen which seems unfortunately not available for the public anymore: www.qlikfix.com/2014/03/11/hash-functions-collisions/ Like Ruben already mentioned is the probability of a collision very low and could be neglected for the most scenarios.

Beside this has a hash no benefits in regard to the performance because it creates large strings which need more resources as numbers. Therefore just using autonumber as statement like hinted would be fine and working well.

But I suggest to consider a change of the data-model because such composite key means that you want to link fact-tables (directly or per link-table) together. Very often this could be avoided by merging all facts per joins/mappings and/or concatenate into a single table and creating a star-scheme data-model which is also the official recommended data-model.

- Marcus

anu
Partner - Contributor III
Partner - Contributor III
Author

@rubenmarin  , @marcus_sommer 

does count of such hash key will have some discrepancy and can we show such key in straight table as a dimension. I know key are typically used to link two or more tables but customer want to show them in tables also.

marcus_sommer

I think there is no general rule if a key-field is used as dimension or another field. It depends always on the data, the data-quality and the view-requirements which dimensions and appropriate developed expressions are expedient or not.

rubenmarin

Hi, if discrepancy is something that worries you, use the autonumber statement.

And the count would be like any other key, it depends of the use, maybe just a count(AutoKey) returns an higher value than the facts because there could be a key in the dimensions table that doesn't have any fact, usually with just a selection or adding a filter to set analysis this kind of error is avoided.

anu
Partner - Contributor III
Partner - Contributor III
Author

@marcus_sommer @rubenmarin 

Now I don't have doubt wrt use of Hash key's. Thanks for the valuable suggestions. 👍

 

Anurag