Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

information density and subset ratio? What is the use of this?

information density and subset ratio? What is the use of this?

thanks in advance

1 Solution

Accepted Solutions
MK_QSL
MVP
MVP

Information density of the field, which indicates the percentage of rows that contain a non-null value

Subset ratio, which shows the percentage of all distinct values for a field in the

table compared to all the distinct values for that field in the entire data model. It is

only relevant for key fields since they are present in multiple tables and do not all

share the same value.

Subset ratios can be used to easily spot problems in key field

associations.

     For example, when the combined total of subset ratios for multiple

tables is 100 percent, this may indicate that there are no matching keys between

these tables.

REF: QlikView 11 For Developers..


Let me give you simple example


Sales:

Load * Inline

[

  Customer, Sales

  A, 100

  B, 200

  D, 300

];

Customer:

Load * Inline

[

  Customer

  A

  B

  C

  D

];

If you write above sample script and will check the Table (CTRL + T), you will find two tables.

Sales and Customers


On Sales Table, if you hover the mouse on Customer field, you can see the Subset Ratio is 75% because there is not sales data for Customer C.

Now if you change the script for table Customer like below..

Customer:

Load * Inline

[

  Customer

  A

  B

  C

  D

]

Where Exists (Customer);

If will not load the Customer C as there are no sales data for the same.

Now check the Subset Ratio. It will be 100%.

If subset ratio is less than 100%, the key is called as Primary Key

but for 100% it is called as Perfect Key.

View solution in original post

17 Replies
MK_QSL
MVP
MVP

Information density of the field, which indicates the percentage of rows that contain a non-null value

Subset ratio, which shows the percentage of all distinct values for a field in the

table compared to all the distinct values for that field in the entire data model. It is

only relevant for key fields since they are present in multiple tables and do not all

share the same value.

Subset ratios can be used to easily spot problems in key field

associations.

     For example, when the combined total of subset ratios for multiple

tables is 100 percent, this may indicate that there are no matching keys between

these tables.

REF: QlikView 11 For Developers..


Let me give you simple example


Sales:

Load * Inline

[

  Customer, Sales

  A, 100

  B, 200

  D, 300

];

Customer:

Load * Inline

[

  Customer

  A

  B

  C

  D

];

If you write above sample script and will check the Table (CTRL + T), you will find two tables.

Sales and Customers


On Sales Table, if you hover the mouse on Customer field, you can see the Subset Ratio is 75% because there is not sales data for Customer C.

Now if you change the script for table Customer like below..

Customer:

Load * Inline

[

  Customer

  A

  B

  C

  D

]

Where Exists (Customer);

If will not load the Customer C as there are no sales data for the same.

Now check the Subset Ratio. It will be 100%.

If subset ratio is less than 100%, the key is called as Primary Key

but for 100% it is called as Perfect Key.

IAMDV
Luminary Alumni
Luminary Alumni

Manoj - Reference Guide is your friend!


My 2 cents about Information Density and Subset Ratio:

Check for Information Density and Subset Ratio: Always perform high level integrity check on your data model. You can see Information Density and Subset Ratio properties in the Table Viewer (Ctrl + T) by hovering on the fields. Investigate wherever Information Density is less than 100% and inform the Architect about the potential issue(s) with the NULL values. I would always check for Subset Ratio whenever I perform a QlikView Join. This way you know how many key field distinct values are associated to other table.

Definitions of Information Density and Subset Ratio (Source – Reference Guide):

    • Information Density is the number of records that have values (i.e. not NULL) in this field as compared to the total number of records in the table.
    • Subset ratio is the number of distinct values of this field found in this table as compared to the total number of distinct values of this field (that is other tables as well).


Also, check what is Perfect Key...

http://community.qlik.com/thread/55769

I hope this helps!

Cheers,

DV

datanibbler
Champion
Champion

Hi Manish,

thanks a lot! I have been working with QlikView for almost a year now, but somehow that was always a hard-to-grasp issue for me.

I think now I understand it. In short:

- Information density on keyfields should be 100%, meaning there are no records with a blank in this field

  (e.g. with date_fields, that should always be the case as there should not be blank date_fields)

- Subset ratio should also ideally be 100%. If it is less, there might be records in the data_model which cannot be properly linked to.

- Keys should always be "Primary" or "Perfect", both of which means the keyfield uniquely identifies each single record.

<=> One thing that is at the root of all trouble in our QlikView_environment is that there are maybe two dozen tables we draw from a database and we need to use a nr. of different keys because, quite often, two or three tables can be linked using one keyfield - but in another table, that field is not present.

I know that is a fundamental issue in database_design and whoever made that database should have thought of it, but that doesn't help us ...

The issue is, none of the original designers of the database we use is here anymore, there is no documentation whatsoever and no one knows how to approach it for any changes 😉

=> Do you have a suggestion as to what could be done if you just cannot get a perfect_key (sometimes not even a primary_key) out of your base_data?

Thanks a lot!

Best regards,

DataNibbler

MK_QSL
MVP
MVP

Dear DataNibbler,

Honestly speaking, I don't have much knowledge on database, so would love if someone from Community will

answer your question.

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Simple approach: get suggestions for key assembly from the SQL queries that try to get data from this database. A SQL query contains relational information that is used to link tables and rows together in the original database. That same information can be used as a starting point for associations and JOIN methods.

A database is never a stand-alone repository. There must be an application that makes use of the data inside by launching queries. Either intercept them in a controlled way or look them up if you have application details available.

Good luck,

Peter

Peter_Cammaert
Partner - Champion III
Partner - Champion III

This is an interesting issue. Better launch this as a separate thread. I'm sure the community will be a big help in figuring this out.

Peter

Anonymous
Not applicable
Author

Well explained

Anonymous
Not applicable
Author

Hi,

Good Example of subset ratio.

Can you give an  example for the information density as well?

Thanks

Not applicable
Author

Great answer I have been mystified bythis concept too. However, caution when it comes to Hierarchy tables say for eg. Product Hierarchy. A lower % means that there are mid-level nodes which of course don't have child Products.

This I found with my Material - Product Hierarchy table . I had 7% subset ratio with 100% information density, with the corresponding ProductHier key table with 99% subset ratio. So if the user selects the Product Hierarchy object - transactions drop off.