Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
SDT
Creator
Creator

How to find closest matching item based on multiple attributes

Hello,

I have a table with tens of thousands of Items. Each Item has 12 attributes (separate field for each attribute).

I would like to have the user select one item and get a resulting list of the most similar items based on matching attributes.

I am thinking I need to use a full outer join and LevenshteinDist on a concatenated field of the 12 attributes. I am hoping there is a better way since that will be a massive table.

Another way might be to compare each of the 12 attribute fields separately in a full outer join and then total up the # of matching attributes (certainly this would be more helpful in the real world scenario).

Hoping someone has a more elegant (efficient) solution?

Labels (1)
5 Replies
marksouzacosta
Partner - Specialist
Partner - Specialist

@igoralcantara this may be a good challenge for you.

Read more at Data Voyagers - datavoyagers.net
igoralcantara
Partner - Specialist
Partner - Specialist

This is an interesting challenge. The best technique depends on the type of each attribute (text, number, date, etc.). Only use LevenshteinDist if the size of the text in different attributes is very similar. If sizes are different, than tokenizing and doing a Cosine distance is better, however, this functionality is not available in Qlik. When I need this, I do it in Python and export the data so I can use it in Qlik.

Check out my latest posts at datavoyagers.net
SDT
Creator
Creator
Author

@igoralcantara Thank you. I had not thought about the varying lengths of the data (all strings). I may need to use LevenshteinDist on each of the 12 attributes and add up the total. Python is not an option for this application.

igoralcantara
Partner - Specialist
Partner - Specialist

Give me more details. You might want to average it instead of summed or maybe average the squared distances.

Check out my latest posts at datavoyagers.net
SDT
Creator
Creator
Author

I put in a Master Measure for now. It is a start, but not great.

I Aggregated on the ItemIDs and Summed up IF(Attribute1Item1=Attribute1Item2,1,0).

It is giving the team a starting point. Score from 1 to 11 for 'exact' matching attributes.