Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Save $600 on Qlik Connect registration! Sign up by Dec. 6 to get an extra $100 off with code CYBERSAVE: REGISTER
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How do I load a few tables without consuming 50GB + RAM?

First, you should know this is my first attempt at using Qlikview.  I am a highly experienced Power Pivot user and am trying to replicate some analysis in Qlikview that I have done in Power Pivot and Excel.

I am using a SQL Server database as the data source.  I am going to reduce from the following numbers as this has taken all the RAM i had (24 GB) and filled the C drive with the swap file and I still had no data loaded.  I am using the OLE DB driver and basic select scripts in the scripting file.

From what I can tell, this has to do with the high volume of text (varchar) columns I am pulling in.  I am hitting 1 view with 7000 columns which is my largest table to pull in.  I am pulling in 13 tables, 4 of which can be loosely classified as fact tables.  That means the remaining tables are going to have heavy text usage in them.

My current source sizes are 16061 rows and 15  MB.  I add a pivoted version of the data later which could double this.

My next steps are to reduce the tables to a small footprint, starting with dimension tables and then add a fact table.  But I need to understand what basic design principle in Qlikview I am missing as this data is tiny.

1 Solution

Accepted Solutions
Miguel_Angel_Baeyens

Hi,

Just to make sure that I understand your question, does loading 16K rows worth 15MB of disk use 50GB+ RAM?

QlikView links tables through fields named alike. When QlikView finds more than one field named the same in more than one table, then it generates what is known as "synthetic table" that is a table that stores all possible combinations of values between all fields that are named the same. It's not unusual when you start using QlikView just follow the steps in the wizard, do not rename any fields and let QlikView "solve" this keys. This takes lots of CPU and RAM, and this might be the cause of a huge usage of resources if fields have long, alphanumeric values.

The usual process and best practice is to use the LOAD statement and rename fields as needed so you can associate (link) tables as you want, even creating new composite key fields if you need to, plus all the other functionality in the QlikView script to load data into memory.

I don't know your source data model, but note that QlikView is not relational, but associative. That means, among many other things that, the more distinct the values are, the more memory will use. So for example, and doing the rough maths, if you have a phone number field that stores 13 chars (001 212 555 1234) and has 1000 distinct values, you will need 13 x 1000 = 13000 RAM you need only for 1000 phone numbers.

However, QlikView does not store the same value twice, so if instead of that you have one field for the international prefix of 3 chars, another area code of another 3 chars, and the phone number in a third field of 7 chars, and assuming all your numbers are for NY, the rough maths would be:

  • 001 is stored only once: 3 bytes
  • 212 is stored only once: 3 bytes
  • 1000 different unique phone numbers: 7 x 1000 = 7000 bytes

Summing up a total of 7006 RAM. Almost one half with the very same amount of rows and values in the source, and having three fields instead of only one!

Now think of applying this logic to dates (day, month, year), timestamps, addresses, name prefix... That takes some work in the load script but it's worth it.

Hope that makes sense.

Miguel

View solution in original post

5 Replies
Miguel_Angel_Baeyens

Hi,

Just to make sure that I understand your question, does loading 16K rows worth 15MB of disk use 50GB+ RAM?

QlikView links tables through fields named alike. When QlikView finds more than one field named the same in more than one table, then it generates what is known as "synthetic table" that is a table that stores all possible combinations of values between all fields that are named the same. It's not unusual when you start using QlikView just follow the steps in the wizard, do not rename any fields and let QlikView "solve" this keys. This takes lots of CPU and RAM, and this might be the cause of a huge usage of resources if fields have long, alphanumeric values.

The usual process and best practice is to use the LOAD statement and rename fields as needed so you can associate (link) tables as you want, even creating new composite key fields if you need to, plus all the other functionality in the QlikView script to load data into memory.

I don't know your source data model, but note that QlikView is not relational, but associative. That means, among many other things that, the more distinct the values are, the more memory will use. So for example, and doing the rough maths, if you have a phone number field that stores 13 chars (001 212 555 1234) and has 1000 distinct values, you will need 13 x 1000 = 13000 RAM you need only for 1000 phone numbers.

However, QlikView does not store the same value twice, so if instead of that you have one field for the international prefix of 3 chars, another area code of another 3 chars, and the phone number in a third field of 7 chars, and assuming all your numbers are for NY, the rough maths would be:

  • 001 is stored only once: 3 bytes
  • 212 is stored only once: 3 bytes
  • 1000 different unique phone numbers: 7 x 1000 = 7000 bytes

Summing up a total of 7006 RAM. Almost one half with the very same amount of rows and values in the source, and having three fields instead of only one!

Now think of applying this logic to dates (day, month, year), timestamps, addresses, name prefix... That takes some work in the load script but it's worth it.

Hope that makes sense.

Miguel

Not applicable
Author

That makes sense.  What concerns me is how much work I am going to have to do to get Qlikview to "play nice" with my data. I have a lot of field names that are the same as would be expected.

So as you note, it takes some work in the load script.  As a result I will now have to manage that script to keep the refresh somewhat efficient.  I will work through my script and substantially reduce my sample set for the demo as a result.  Ideally I should have been able to consume the data efficiently, especially with any type of compression in play.

This feels more like work I would do when putting together an Analysis Services Multidimensional cube.  Basically, it works great if the data is correctly configured and formatted for use by the product.

I will update what I was able to get done here and mark answers accordingly.

Thanks.


Miguel_Angel_Baeyens

Hi Steve,

You're welcome. Yes, loading data the right way varies from tools and takes some time to get used to this new logic.

Now that you mention it, I'm no Microsoft BI expert, but I have been able to load using MDX queries with the appropriate driver directly to Analysis Services cubes. I don't know if that may help you, but I guess that the MDX part is the easier for you...

Kind regards,

Miguel

Not applicable
Author

Thanks Miguel,

I have worked with SSAS cubes for years. What has changed is that I can load a Power Pivot in-memory model at data and build references after the fact.  Qlikview operates on a similar model as cubes from what I can tell, it's still early for me.  Qlikview requires the data to be properly formed either at the source or in the LOAD script in order to be managed properly in the in-memory database. This is one of the keys to successful cubes and my guess the same is true here.

I was expecting Qlikview to function more as a power user tool as opposed to needing more IT support beyond access to the data. I will work with the LOAD functionality and see how close I can get to the Power Pivot models.

Thanks again for the help on this.

Regards,
Steve

Not applicable
Author

Miguel,

As I am working through the load process and recreating all of my SQL in the script, I think I figured out what my core issue was.  Circular relationships.  Due to the associative nature of the data, it draws relationships between "fact" tables which results in circular relationships.  I hit the first one as I added the second fact table.

Next up, need to discover how to fix bad associations.

Thanks again,

Steve