Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
jmvilaplanap
Specialist
Specialist

How much memory uses a null value in a table?

Hi all,

In my models, normally I create a Fact Table with several facts, concatenating individual fact tables (later is easy to use, using set analysis expressions to filter the data set)

Then finally I have a Fact Table with a lot of fields with null values for example:

DataSetDim1 - IDDim2 - IDDim3 - IDValue1Value2Value3
11--10--
12--12--
2-1--34-
2-2--33-
3--1--21
3--2--53

From time to time, some datasets have shared dimensions but I still have a lot of nulls. For example, in a model I have a Fact Table with 8 distinct datasets, 1.5M rows and 60 Fields (between dimension keys and values).

There are better way to do this? Is this efficient?

Thanks

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

I'm going on memory from a post by hic, but I was unable to track it down with a brief search.

Your QlikView tables are really storing index values into another table, which is the list of distinct values for that field. The more distinct values you have, the more bits this index requires. However, as I recall, null() is special, it being a possible "value" for every field, so it has its own designation that I believe is independent of the number of distinct values for the field. It's something small, like a couple of bits, though actual memory use depends on some other factors as well that I'm not remembering the details of, something like your table must be an even number of bytes wide or something, so the nulls might take zero bits, or 8 bits, or something like that. Not certain. In any case, my main takeaway was that I don't need to worry much about nulls in my data model. Yes, they take space, but it's not a lot of space. As you said, they're as the air of a ball - we can't see them, but they have weight - but like the air, very little weight.

There's a chance I've remembered wrong. I don't actually know what I'm talking about. Take what I say with a grain of salt.

View solution in original post

11 Replies
Anonymous
Not applicable

I guess null() consumes no memory because "it isn't there"...

jmvilaplanap
Specialist
Specialist
Author

Hi Robin

I think nulls are as the air of a ball, we can't see it but it has a weight.

I made a test with this script:

Table:
LOAD
null()          AS nothing
AutoGenerate(10);
STORE Table into 'Table.qvd' (qvd);


And the size of the QVD is 1,311 bytes with 10 rows and 2,216 bytes with 1,000 rows. But the point is if exist another way better than this one.

Thanks.

Anonymous
Not applicable

I don't know exactly how it works, but I know, that the main factor for performance problems is the number of distinct values.

Perhaps the experts can leave a comment.

rwunderlichtroyanskyjohnw

Kushal_Chawda

I don't think that the nulls occupy more spaces. It's the more number of unique values in particular field occupies more memory. AFAIK, Null values does not have any impact on performance.

marcus_sommer

I agree NULL itself don't consume memory but the structure of a table respectively the datamodel will of course have an impact.

Your used approach is quite common and worked usually very well and quite performant within the GUI. Nevertheless I think if you could join or map your tables together you would save some RAM because you would need lesser records and therefore the table-pointer would be smaller. With your example above it could be only 2 records instead of 6.

Here an example which is a bit similar to your question even if the NULL handling wasn't the focus there: Re: Data Model Question - Fat or Thin Fact table?

In reality it might be not so easy if your tables aren't 1:1 else 1:n or even n:m related and/or there are missing key-values between them. In such cases your needed transformations might be take longer and are a quite more complex. As far as your approach is fast enough and you are not running into a RAM bottleneck I wouldn't change it.

- Marcus

johnw
Champion III
Champion III

I'm going on memory from a post by hic, but I was unable to track it down with a brief search.

Your QlikView tables are really storing index values into another table, which is the list of distinct values for that field. The more distinct values you have, the more bits this index requires. However, as I recall, null() is special, it being a possible "value" for every field, so it has its own designation that I believe is independent of the number of distinct values for the field. It's something small, like a couple of bits, though actual memory use depends on some other factors as well that I'm not remembering the details of, something like your table must be an even number of bytes wide or something, so the nulls might take zero bits, or 8 bits, or something like that. Not certain. In any case, my main takeaway was that I don't need to worry much about nulls in my data model. Yes, they take space, but it's not a lot of space. As you said, they're as the air of a ball - we can't see them, but they have weight - but like the air, very little weight.

There's a chance I've remembered wrong. I don't actually know what I'm talking about. Take what I say with a grain of salt.

jmvilaplanap
Specialist
Specialist
Author

Thanks all for you opinion, I understand that is not necesary to be worried about the null fields, only in the "diversity" of values inside a column. But then, why people do snowflakes schemas? I think you always could do a model as a star, isn't it?

Maybe this is another debate.

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

sorry I'm late to this discussion... Yes, I can confirm that nulls should not worry you too much. I don't have exact benchmarks of the memory consumption, but it should not be extremely high.

To answer your last question about snowflakes - people usually build snowflakes for their convenience and ease of understanding their data, not to avoid null values. Star schema will always perform better and faster than a snowflake, so if performance is an issue, I'd personally recommend a star over a snowflake.

However, if the data size is manageable, then a snowflake might be easier to build, maintain, and understand, and that holds a lot of value.

cheers,

Oleg Troyansky

Upgrade your Qlik skills at the Masters Summit for Qlik - coming to Boston, MA this October!

jmvilaplanap
Specialist
Specialist
Author

Thanks troyansky

But, Why is easier to maintain a snowflake than a star schema? In my opinion star is easier. Concatenating fact tables you can do a star schema almost always.

When I was doing my very firsts Qlik data models I was trying to design a snowflake schemas (like a conventional  relational database) but when I understood the Qlik philosophy I began to use a star schemas, they are (in my opinion) easier to design and maintain.

Regards