I have instances of IDs and Names where the name values are very similar - in some cases, they match exactly with one or more other instance (it may be 1 other, 2 other, unlimited other instances) - the IDs are unique. A sample of the data is below:
ABC Co - 2006-11
ABC Co - 2006-12
ABC Co - 2007-11
ABC Co - 2007-11
XYZ - City A
XYZ - City A-
How can I extract these IDs and Names, group the Names together by their other instances and display them in a way that clearly shows the grouped instances together (perhaps in a pivot table)? For example, I would like to display the 3 instances of 'MMM Co.' grouped together so the user knows all 3 instances are together.
one solution could be to calculate a group number field (e.g. using the Levenshtein algorithm as described here: Levenshtein Algorithm) and to use this field to group and/or colour code NAME values using a straight or pivot table:
LOAD * INLINE [
123456, A- Name
123457, A Name
123458, A+ Name
111410, ABC Co - 2006-11
111411, ABC Co - 2006-12
111418, ABC Co - 2007-11
111420, ABC Co - 2007-11
198220, XYZ - City A
198227, XYZ - City A-
199200, MMM Co.
199210, MMM Co.
199220, MMM Co.
LOAD Distinct NAME Resident table1;
LOAD Distinct NAME as NAME2 Resident table1;
LOAD NAME, NAME2, levenshtein(NAME, NAME2) as LevDistNAME Resident table2;
LOAD Distinct LevDistNAME Resident table2 Where LevDistNAME<=2;
I am VERY interested in your solution - never heard of the Levenshtein algorithm before - but before I try implementing your suggestion, the Name fields I provided are just examples - in reality, the Names won't be hardcoded in an Inline Load, they'll be retrieved from a QVD, so I'll have no idea of the actual names nor how many there are - will your suggestion still work in this case?
Yes Sunny, I am looking to combine names that are identical and I am looking to combine names that are similar. All the matching logic has already been done in the underlying SQL code - the SQL resultset is exported to a QVD, and my QVW will retrieve the data from the QVD, and I now need to find a way to group these names together in a graphical way.
I used the inline load only to get some test data. You can replace it with a QVD load in a real application.
An issue might rather be the performance for large tables / many distinct Names.
You could try to replace the cartesian product (Join of tables without common fields) with a loop. The following example reduced rather the memory footprint of the load instead of the runtime though (again loading/generating some sample data / to replace with QVD load):
LOAD RecNo() as ID,
KeepChar(Hash128(RecNo()),'ABCDEFGHIJKLMNOPQRSTUVWXYZ') as NAME