One thing we often need to consider is the size of the QlikView data model in memory. In this post, I’ve included a few ideas for reducing the size. Consider the following simple script:
LOAD [Student Id], [Unique Identifier], [Email Address] FROM ...
This example script loads approx 199k records identified by a [Unique Identifier]; there are approx. 169k [EmailAddress]es and 175k [Student Id] numbers. As we can see these fields take up about 13MB in memory:
[Unique Identifier] is a 12 character primary key and is made up of the [Student Id] concatenated with 2 sequence numbers e.g. ‘1234567’8 & ‘01’ & ’02’ to give’ 123456780102’. [Student Id] should always be 8 characters and the [Email Address] is of the standard form <user name> @ <domain name> and is variable in length.
This is a key field used solely to associated tables in the data mode. It is never presented to the user so there is no need for it to consume so much space in the data model as long as we maintain its uniqueness. This can be done by applying one of the AutoNumber functions that QlikView provides. In this case we can use AutoNumber Hash128:
This is not a key field and is not unique, but is presented to the user. Therefore we need to maintain the original value of it, so we can’t use one of the AutoNumber functions to save space. The id consists of 8 digits, which means that we need (in theory) to be able to store a number up to 99,999,999. However, we can take advantage of the fact that QlikView stores each occurrence of a value only once and internally uses pointers to link records to particular values. We do this by splitting the [Student Id] into two 4 digit components ID_1 and ID_2, which now each only have to store a maximum value of 9,999 as well as rationalising the space used. For example, consider two id numbers: 01024567 and 01034567. If we stored [Student ID] in its original form we would need to store two fields capable of holding the two 8 digit id numbers. If we split the id into two parts, ID_1 and ID_2, QlikView would need to store 0102, and 0103 for ID_1, but 4567 for ID_2 only once as it is the same in both cases. Roughly speaking, this means that QlikView now has to store only 12 digits rather than 16. Over thousands of records this starts to make a difference.
For presentation, we can reconstruct the [Student ID] by concatenating:
ID_1 & ID_2
Again this is not a key field and is presented to the user. We can use the same principle as with [Student ID] to save space by splitting the address into two components: [Email User] and [EmailDomain]. Because many records will share email domains e.g. gmail.com, we save space in the data model. For presentation, we can reconstruct the email address by concatenating: