Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
Before I try a different approach with data comparison, thought I would post to the community since I will be using Qlikview to visualize the result.
I have many datasets with varying number of elements. All data is in a single table. Values within a dataset are unique.
I want to compare Values in each dataset and provide a degree of similarity. I have about 6000 such datasets with 50-200 unique Values. A hypothetical scenario below in the table:
1) How can a user choose 'Set1' to see closest dataset? in the example Set2 is closest @50%, and then Set3 @17% Match
- Should I build something in the Load script? I don't have other tools to prep the data outside
- Could using Intersect (join by Values field) to compare Default State to an Alternate State be a possible approach? I am not sure if this is possible (list dataset where Values from $ State = Values from Alt State in descending order).
2) If possible I would like to visually cluster similar datasets but that is optional
- I may need a 'similarity index' for each dataset. If anyone has done something similar I would love to know how.
Thanks!
Shaheer
Data set | Values | Data set | Values | Data set | Values | |||
---|---|---|---|---|---|---|---|---|
Set1 | 1 | Set2 | 2 | Set3 | 6 | |||
Set1 | 2 | Set2 | 3 | Set3 | 7 | |||
Set1 | 3 | Set2 | 6 | Set3 | 8 | |||
Set1 | 4 | Set2 | 7 | Set3 | 9 | |||
Set1 | 5 | Set2 | 8 | Set3 | 10 | |||
Set1 | 6 | Set3 | 11 | |||||
Set3 | 12 | |||||||
Compare | 1 & 2 | 2 & 1 | 2 & 3 | 3 & 2 | 1 & 3 | 3 & 1 | ||
2 | 2 | 6 | 6 | 6 | 6 | |||
3 | 3 | 7 | 7 | |||||
6 | 6 | 8 | 8 | |||||
Match! | 50% | 60% | 60% | 43% | 17% | 14% |