Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a table with tens of thousands of Items. Each Item has 12 attributes (separate field for each attribute).
I would like to have the user select one item and get a resulting list of the most similar items based on matching attributes.
I am thinking I need to use a full outer join and LevenshteinDist on a concatenated field of the 12 attributes. I am hoping there is a better way since that will be a massive table.
Another way might be to compare each of the 12 attribute fields separately in a full outer join and then total up the # of matching attributes (certainly this would be more helpful in the real world scenario).
Hoping someone has a more elegant (efficient) solution?
@igoralcantara this may be a good challenge for you.
Read more at Data Voyagers - datavoyagers.net
Follow me on my LinkedIn | Know IPC Global at ipc-global.com
This is an interesting challenge. The best technique depends on the type of each attribute (text, number, date, etc.). Only use LevenshteinDist if the size of the text in different attributes is very similar. If sizes are different, than tokenizing and doing a Cosine distance is better, however, this functionality is not available in Qlik. When I need this, I do it in Python and export the data so I can use it in Qlik.
@igoralcantara Thank you. I had not thought about the varying lengths of the data (all strings). I may need to use LevenshteinDist on each of the 12 attributes and add up the total. Python is not an option for this application.
Give me more details. You might want to average it instead of summed or maybe average the squared distances.
I put in a Master Measure for now. It is a start, but not great.
I Aggregated on the ItemIDs and Summed up IF(Attribute1Item1=Attribute1Item2,1,0).
It is giving the team a starting point. Score from 1 to 11 for 'exact' matching attributes.