Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
information density and subset ratio? What is the use of this?
thanks in advance
Information density of the field, which indicates the percentage of rows that contain a non-null value
Subset ratio, which shows the percentage of all distinct values for a field in the
table compared to all the distinct values for that field in the entire data model. It is
only relevant for key fields since they are present in multiple tables and do not all
share the same value.
Subset ratios can be used to easily spot problems in key field
associations.
For example, when the combined total of subset ratios for multiple
tables is 100 percent, this may indicate that there are no matching keys between
these tables.
REF: QlikView 11 For Developers..
Let me give you simple example
Sales:
Load * Inline
[
Customer, Sales
A, 100
B, 200
D, 300
];
Customer:
Load * Inline
[
Customer
A
B
C
D
];
If you write above sample script and will check the Table (CTRL + T), you will find two tables.
Sales and Customers
On Sales Table, if you hover the mouse on Customer field, you can see the Subset Ratio is 75% because there is not sales data for Customer C.
Now if you change the script for table Customer like below..
Customer:
Load * Inline
[
Customer
A
B
C
D
]
Where Exists (Customer);
If will not load the Customer C as there are no sales data for the same.
Now check the Subset Ratio. It will be 100%.
If subset ratio is less than 100%, the key is called as Primary Key
but for 100% it is called as Perfect Key.
Information density of the field, which indicates the percentage of rows that contain a non-null value
Subset ratio, which shows the percentage of all distinct values for a field in the
table compared to all the distinct values for that field in the entire data model. It is
only relevant for key fields since they are present in multiple tables and do not all
share the same value.
Subset ratios can be used to easily spot problems in key field
associations.
For example, when the combined total of subset ratios for multiple
tables is 100 percent, this may indicate that there are no matching keys between
these tables.
REF: QlikView 11 For Developers..
Let me give you simple example
Sales:
Load * Inline
[
Customer, Sales
A, 100
B, 200
D, 300
];
Customer:
Load * Inline
[
Customer
A
B
C
D
];
If you write above sample script and will check the Table (CTRL + T), you will find two tables.
Sales and Customers
On Sales Table, if you hover the mouse on Customer field, you can see the Subset Ratio is 75% because there is not sales data for Customer C.
Now if you change the script for table Customer like below..
Customer:
Load * Inline
[
Customer
A
B
C
D
]
Where Exists (Customer);
If will not load the Customer C as there are no sales data for the same.
Now check the Subset Ratio. It will be 100%.
If subset ratio is less than 100%, the key is called as Primary Key
but for 100% it is called as Perfect Key.
Manoj - Reference Guide is your friend!
My 2 cents about Information Density and Subset Ratio:
Check for Information Density and Subset Ratio: Always perform high level integrity check on your data model. You can see Information Density and Subset Ratio properties in the Table Viewer (Ctrl + T) by hovering on the fields. Investigate wherever Information Density is less than 100% and inform the Architect about the potential issue(s) with the NULL values. I would always check for Subset Ratio whenever I perform a QlikView Join. This way you know how many key field distinct values are associated to other table.
Definitions of Information Density and Subset Ratio (Source – Reference Guide):
Also, check what is Perfect Key...
http://community.qlik.com/thread/55769
I hope this helps!
Cheers,
DV
Hi Manish,
thanks a lot! I have been working with QlikView for almost a year now, but somehow that was always a hard-to-grasp issue for me.
I think now I understand it. In short:
- Information density on keyfields should be 100%, meaning there are no records with a blank in this field
(e.g. with date_fields, that should always be the case as there should not be blank date_fields)
- Subset ratio should also ideally be 100%. If it is less, there might be records in the data_model which cannot be properly linked to.
- Keys should always be "Primary" or "Perfect", both of which means the keyfield uniquely identifies each single record.
<=> One thing that is at the root of all trouble in our QlikView_environment is that there are maybe two dozen tables we draw from a database and we need to use a nr. of different keys because, quite often, two or three tables can be linked using one keyfield - but in another table, that field is not present.
I know that is a fundamental issue in database_design and whoever made that database should have thought of it, but that doesn't help us ...
The issue is, none of the original designers of the database we use is here anymore, there is no documentation whatsoever and no one knows how to approach it for any changes 😉
=> Do you have a suggestion as to what could be done if you just cannot get a perfect_key (sometimes not even a primary_key) out of your base_data?
Thanks a lot!
Best regards,
DataNibbler
Dear DataNibbler,
Honestly speaking, I don't have much knowledge on database, so would love if someone from Community will
answer your question.
Simple approach: get suggestions for key assembly from the SQL queries that try to get data from this database. A SQL query contains relational information that is used to link tables and rows together in the original database. That same information can be used as a starting point for associations and JOIN methods.
A database is never a stand-alone repository. There must be an application that makes use of the data inside by launching queries. Either intercept them in a controlled way or look them up if you have application details available.
Good luck,
Peter
This is an interesting issue. Better launch this as a separate thread. I'm sure the community will be a big help in figuring this out.
Peter
Well explained
Hi,
Good Example of subset ratio.
Can you give an example for the information density as well?
Thanks
Great answer I have been mystified bythis concept too. However, caution when it comes to Hierarchy tables say for eg. Product Hierarchy. A lower % means that there are mid-level nodes which of course don't have child Products.
This I found with my Material - Product Hierarchy table . I had 7% subset ratio with 100% information density, with the corresponding ProductHier key table with 99% subset ratio. So if the user selects the Product Hierarchy object - transactions drop off.