Qlik Community

QlikView Documents

Documents for QlikView related information.

Announcements
Modernize Your QlikView Deployment webinar, Nov. 3rd. REGISTER

Reducing the Size of a Data Model

Creator III
Creator III

Reducing the Size of a Data Model

One thing we often need to consider is the size of the QlikView data model in memory. In this post, I’ve included a few ideas for reducing the size. Consider the following simple script:

LOAD
  [Student Id],
  [Unique Identifier],
 
[Email Address]
FROM ...


This example script loads approx 199k records identified by a [Unique Identifier]; there are approx. 169k [Email Address]es and 175k [Student Id] numbers. As we can see these fields take up about 13MB in memory:

Class

Type

Id

Count

Size

Bytes

         13,371,456

Database

Field

Unique Identifier

199218

26

5,179,381

Database

Field

Email Address

169780

29.72

5,046,275

Database

Field

Student Id

174774

18

3,145,800

[Unique Identifier] is a 12 character primary key and is made up of the [Student Id] concatenated with 2 sequence numbers e.g. ‘1234567’8 & ‘01’ & ’02’ to give’ 123456780102’. [Student Id] should always be 8 characters and the [Email Address] is of the standard form <user name> @ <domain name> and is variable in length.

Unique Identifier

This is a key field used solely to associated tables in the data mode. It is never presented to the user so there is no need for it to consume so much space in the data model as long as we maintain its uniqueness. This can be done by applying one of the AutoNumber functions that QlikView provides. In this case we can use AutoNumber Hash128:

     AutoNumberHash128([Unique Identifier]) AS [Unique Identifier],

Student ID

This is not a key field and is not unique, but is presented to the user. Therefore we need to maintain the original value of it, so we can’t use one of the AutoNumber functions to save space. The id consists of 8 digits, which means that we need (in theory) to be able to store a number up to 99,999,999. However, we can take advantage of the fact that QlikView stores each occurrence of a value only once and internally uses pointers to link records to particular values. We do this by splitting the [Student Id] into two 4 digit components ID_1 and ID_2, which now each only have to store a maximum value of 9,999 as well as rationalising the space used. For example, consider two id numbers: 01024567 and 01034567. If we stored [Student ID] in its original form we would need to store two fields capable of holding the two 8 digit id numbers. If we split the id into two parts, ID_1 and ID_2, QlikView would need to store 0102, and 0103 for ID_1, but 4567 for ID_2 only once as it is the same in both cases.  Roughly speaking, this means that QlikView now has to store only 12 digits rather than 16. Over thousands of records this starts to make a difference.

For presentation, we can reconstruct the [Student ID] by concatenating:

ID_1 & ID_2

Email Address

Again this is not a key field and is presented to the user. We can use the same principle as with [Student ID] to save space by splitting the address into two components: [Email User] and [Email Domain]. Because many records will share email domains e.g. gmail.com, we save space in the data model. For presentation, we can reconstruct the email address by concatenating:

[Email User] & ‘@’ & [Email Domain]

Our modified script for this example now becomes:


LOAD
Left([Student Id],4) AS ID_1,
Right([Student Id],4) AS ID_2,
AutoNumberHash128([Unique Identifier]) AS [Unique Identifier],
SubField([Email Address], '@', 1) AS [Email User],
SubField ([Email Address], '@', 2) AS [Email Domain]
FROM ...

So, although we now have more fields in our data model, the results show that the size of the model has been reduced by around a third.

Class

Type

Id

Count

Size

Bytes

4,743,745

Database

Field

Email User

168516

17.21

2,900,475

Database

Field

Unique Identifier

199218

8.00

1,593,744

Database

Field

ID_2

10001

14.00

140,010

Database

Field

Email Domain

5033

21.29

107,150

Database

Field

ID_1

173

13.68

2,366

Credits:

  Memory stats from DocumentAnalyzer_V1.14 by Rob Wunderlich

Tags (3)
Labels (1)
Comments
Specialist
Specialist

Azam, thanks for the post!

0 Likes
MVP & Luminary
MVP & Luminary

Good Post Azam. Thanks for sharing in community.

0 Likes
Not applicable

Worth reading...good post

0 Likes

Nice! How data will reduce while using Autonumber?

0 Likes
Creator III
Creator III

In this case because of the way that the unique identifier was contructed by concatenation, using AutoNumber was a simple way to convert it to a numeric value. Storing numbers is more efficient than storing strings.

Thanks,

Azam

0 Likes
Master II
Master II

Very Nice & useful

Thank's for sharing

Saludos,

Enrique

0 Likes
Version history
Revision #:
1 of 1
Last update:
‎2013-12-04 07:47 AM
Updated by: