Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
shaheermecci
Contributor II
Contributor II

Dataset comparison & clustering

Hello,

Before I try a different approach with data comparison, thought I would post to the community since I will be using Qlikview to visualize the result.

I have many datasets with varying number of elements. All data is in a single table. Values within a dataset are unique.

I want to compare Values in each dataset and provide a degree of similarity. I have about 6000 such datasets with 50-200 unique Values. A hypothetical scenario below in the table:


1) How can a user choose 'Set1' to see closest dataset? in the example Set2 is closest @50%, and then Set3 @17% Match

     - Should I build something in the Load script? I don't have other tools to prep the data outside

     - Could using Intersect (join by Values field) to compare Default State to an Alternate State be a possible approach? I am not sure if this is possible (list dataset where Values from $ State = Values from Alt State in descending order).


2) If possible I would like to visually cluster similar datasets but that is optional

     - I may need a 'similarity index' for each dataset. If anyone has done something similar I would love to know how.


Thanks!

Shaheer

Data setValues  Data setValues  Data setValues
Set11Set22Set36
Set12Set23Set37
Set13Set26Set38
Set14Set27Set39
Set15Set28Set310
Set16Set311
Set312
Compare1 & 22 & 12 & 33 & 21 & 33 & 1
226666
3377
6688
Match!50%60%60%43%17%14%
1 Reply
MarcoWedel

Hi,

maybe helpful:

Community Detection

regards

Marco