My 2 cents about Information Density and Subset Ratio:
Check for Information Density and Subset Ratio: Always perform high level integrity check on your data model. You can see Information Density and Subset Ratio properties in the Table Viewer (Ctrl + T) by hovering on the fields. Investigate wherever Information Density is less than 100% and inform the Architect about the potential issue(s) with the NULL values. I would always check for Subset Ratio whenever I perform a QlikView Join. This way you know how many key field distinct values are associated to other table.
Definitions of Information Density and Subset Ratio (Source – Reference Guide):
Information Density is the number of records that have values (i.e. not NULL) in this field as compared to the total number of records in the table.
Subset ratio is the number of distinct values of this field found in this table as compared to the total number of distinct values of this field (that is other tables as well).
thanks a lot! I have been working with QlikView for almost a year now, but somehow that was always a hard-to-grasp issue for me.
I think now I understand it. In short:
- Information density on keyfields should be 100%, meaning there are no records with a blank in this field
(e.g. with date_fields, that should always be the case as there should not be blank date_fields)
- Subset ratio should also ideally be 100%. If it is less, there might be records in the data_model which cannot be properly linked to.
- Keys should always be "Primary" or "Perfect", both of which means the keyfield uniquely identifies each single record.
<=> One thing that is at the root of all trouble in our QlikView_environment is that there are maybe two dozen tables we draw from a database and we need to use a nr. of different keys because, quite often, two or three tables can be linked using one keyfield - but in another table, that field is not present.
I know that is a fundamental issue in database_design and whoever made that database should have thought of it, but that doesn't help us ...
The issue is, none of the original designers of the database we use is here anymore, there is no documentation whatsoever and no one knows how to approach it for any changes 😉
=> Do you have a suggestion as to what could be done if you just cannot get a perfect_key (sometimes not even a primary_key) out of your base_data?
Simple approach: get suggestions for key assembly from the SQL queries that try to get data from this database. A SQL query contains relational information that is used to link tables and rows together in the original database. That same information can be used as a starting point for associations and JOIN methods.
A database is never a stand-alone repository. There must be an application that makes use of the data inside by launching queries. Either intercept them in a controlled way or look them up if you have application details available.
Great answer I have been mystified bythis concept too. However, caution when it comes to Hierarchy tables say for eg. Product Hierarchy. A lower % means that there are mid-level nodes which of course don't have child Products.
This I found with my Material - Product Hierarchy table . I had 7% subset ratio with 100% information density, with the corresponding ProductHier key table with 99% subset ratio. So if the user selects the Product Hierarchy object - transactions drop off.