Today I decided to blog about the Autonumber function that can be used to create a “compact memory representation of a complex key.” Having recently learned about this function, I realized there have been times in the past when this would have been helpful to use. For instance, when I need to build a link table in my data model, I often create keys that I use to link the tables. Sometime these key fields are lengthy and are a combination of 3 or 4 fields. In this blog, I will show you how you can use the Autonumber function to create a “compact memory representation of a complex key.”
Assume I load a data set that looks like this:
And I want to load another data set that looks like this:
These two data sets have the same first four fields so if I were to load them as is, I would get a synthetic table in my data model. To avoid that I will set up a key field in each of the tables that includes the FoodCategory, StoreNo, Year and Month fields. This key field will be the field that links these two tables. I will do this using a preceding load when I load both of these tables. The script would look like this:
In the first table, I am using a preceding load to load all fields and then I am using the Autonumber function to create a key field that represents the four fields: FoodCategory, StoreNo, Year and Month. I am doing the same thing in the second table I am loading but the difference here is that I am not loading the key fields. By not loading the key fields, I am preventing a synthetic table from being loaded. The end result looks like this:
Notice the FSYMkey field. In this example, it is a unique integer that represents a larger expression. In the past, I would have created the key field like this (see the FSYMkey2 field in the table below):
FSYMkey2 is a more complex field that would have taken up more memory. This example is small but if you had thousands of unique key fields like this, the consumed memory would add up. By using the Autonumber function, I was able to use an integer to represent a long string thus minimizing the memory usage in my app. This is one of many tricks that can be used to reduce the memory usage in your app. Henric Cronstrom has some other ideas in his Symbol Tables and Bit-Stuffed Pointers blog. Check it out.
Agreed. Using Autonumber() like this is, memory wise, phenomenally efficient as the values do not even get stored in the Symbol Tables - i.e. they take up zero bytes in the Symbol Tables.
Please, remember to do not use AutoNumber function when you want to store data in QVD file. AutoNumber function generate always number started from 1 for each load. So, different load time can generate different result number from the AutoNumber.
To reduce data for different fieds generated by AutoNumber function use AutoNumber(field_1 & ... & field_N, 'AutoNumberKey').
Remember to always use the second parm to autonumber -- the "autoid" to ensure your values are sequential when autonumbering more than one field. Otherwise you will lose the symbol table optimization.
Here each Key starts with 1 and generating sequential integers for individual fields.
Memory Consumption
See the memory consumption of the above two approaches. Although the app size is very small but still there has an impact of using Autonumber() with the autoid.
Continuing with the example of Sanders, I think you can solve this using the function AutoNumberHashXXX.
Load *, AutoNumber(F1&F2,'AN1') as AN,
AutoNumber(F1&'-'&F2,'AN2') as AN2,
AutoNumberHash128(F1,F2,'AN3') as AN3;
LOAD * INLINE [
F1, F2
101, 12
1011, 2
];
As you can see the first case as mention before, it gives you the same result, but using a separator or the function AutoNumberHash128, it solve the problem.
Yes, but as mentioned prior, don't use autonumberhashxxx when building composite keys in your QVDs, which often get built and loaded at different times. So one approach is to use the hashxxx(field1,field2...fieldn) when building composite keys in QVDs (fields can be listed without a separator) and then use autonumber() on the keys when loading from QVDs in order to conserve memory. Since autonumber is only referencing one field (the prebuilt hash key), you don't need the autoID, but it's good practice to get in the habit in case you are autonumbering more than one field.
Per the Qlik help site:
You can only connect autonumberhash256 keys that have been generated in the same data load, as the integer is generated according to the order the table is read. If you need to use keys that are persistent between data loads, independent of source data sorting, you should use thehash128, hash160 or hash256 functions.
"Since autonumber is only referencing one field (the prebuilt hash key), you don't need the autoID".
I don't think that statement is correct. The AutoID has nothing to do with the number of fields you are using as input parameters. Rather AutoID is required when you are creating more than one output fieldname -- to ensure sequential numbers.