Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
shaheermecci
Contributor II
Contributor II

Dataset comparison & clustering

Hello,

Before I try a different approach with data comparison, thought I would post to the community since I will be using Qlikview to visualize the result.

I have many datasets with varying number of elements. All data is in a single table. Values within a dataset are unique.

I want to compare Values in each dataset and provide a degree of similarity. I have about 6000 such datasets with 50-200 unique Values. A hypothetical scenario below in the table:


1) How can a user choose 'Set1' to see closest dataset? in the example Set2 is closest @50%, and then Set3 @17% Match

     - Should I build something in the Load script? I don't have other tools to prep the data outside

     - Could using Intersect (join by Values field) to compare Default State to an Alternate State be a possible approach? I am not sure if this is possible (list dataset where Values from $ State = Values from Alt State in descending order).


2) If possible I would like to visually cluster similar datasets but that is optional

     - I may need a 'similarity index' for each dataset. If anyone has done something similar I would love to know how.


Thanks!

Shaheer

Data setValues  Data setValues  Data setValues
Set11Set22Set36
Set12Set23Set37
Set13Set26Set38
Set14Set27Set39
Set15Set28Set310
Set16Set311
Set312
Compare1 & 22 & 12 & 33 & 21 & 33 & 1
226666
3377
6688
Match!50%60%60%43%17%14%
1 Reply
MarcoWedel

Hi,

maybe helpful:

Community Detection

regards

Marco