Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Sorting similar names for display

Hello,

I have instances of IDs and Names where the name values are very similar - in some cases, they match exactly with one or more other instance (it may be 1 other, 2 other, unlimited other instances) - the IDs are unique. A sample of the data is below:

  

IDNAME
123456A- Name
123457A Name
123458A+ Name
111410ABC Co - 2006-11
111411ABC Co - 2006-12
111418ABC Co - 2007-11
111420ABC Co - 2007-11
198220XYZ - City A
198227XYZ - City A-
199200MMM Co.
199210MMM Co.
199220MMM Co.

How can I extract these IDs and Names, group the Names together by their other instances and display them in a way that clearly shows the grouped instances together (perhaps in a pivot table)? For example, I would like to display the 3 instances of 'MMM Co.' grouped together so the user knows all 3 instances are together.

Many thanks

6 Replies
settu_periasamy
Master III
Master III

May be like this?

T1:

LOAD * INLINE [

    ID, NAME

    123456, A- Name

    123457, A Name

    123458, A+ Name

    111410, ABC Co - 2006-11

    111411, ABC Co - 2006-12

    111418, ABC Co - 2007-11

    111420, ABC Co - 2007-11

    198220, XYZ - City A

    198227, XYZ - City A-

    199200, MMM Co.

    199210, MMM Co.

    199220, MMM Co.

];

T2:

Left Join(T1)

LOAD NAME,if(Count(NAME)>1,'Yes','No') as Flag Resident T1 Group by NAME;

Capture.JPG

sunny_talwar

MMM Co. was straight forward because it was an exact match? Are you also looking to combine XYZ - City A and XYZ - City A- Since they are almost very close?

MarcoWedel

Hi,

one solution could be to calculate a group number field (e.g. using the Levenshtein algorithm as described here: Levenshtein Algorithm) and to use this field to group and/or colour code NAME values using a straight or pivot table:

QlikCommunity_Thread_215087_Pic1.JPG

QlikCommunity_Thread_215087_Pic2.JPG

QlikCommunity_Thread_215087_Pic3.JPG

QlikCommunity_Thread_215087_Pic4.JPG

QlikCommunity_Thread_215087_Pic5.JPG

table1:

LOAD * INLINE [

    ID, NAME

    123456, A- Name

    123457, A Name

    123458, A+ Name

    111410, ABC Co - 2006-11

    111411, ABC Co - 2006-12

    111418, ABC Co - 2007-11

    111420, ABC Co - 2007-11

    198220, XYZ - City A

    198227, XYZ - City A-

    199200, MMM Co.

    199210, MMM Co.

    199220, MMM Co.

];

table2:

LOAD Distinct NAME Resident table1;

Join

LOAD Distinct NAME as NAME2 Resident table1;

Join

LOAD NAME, NAME2, levenshtein(NAME, NAME2) as LevDistNAME Resident table2;

Right Join

LOAD Distinct LevDistNAME Resident table2 Where LevDistNAME<=2;

Join (table1)

LOAD NAME,

     AutoNumber(Concat(NAME2,'/')) as NameGroup

Resident table2

Group By NAME;

DROP Table table2;

hope this helps

regards

Marco

Anonymous
Not applicable
Author

Hi Marco,

I am VERY interested in your solution - never heard of the Levenshtein algorithm before - but before I try implementing your suggestion, the Name fields I provided are just examples - in reality, the Names won't be hardcoded in an Inline Load, they'll be retrieved from a QVD, so I'll have no idea of the actual names nor how many there are - will your suggestion still work in this case?

Thanks so much.

Anonymous
Not applicable
Author

Yes Sunny, I am looking to combine names that are identical and I am looking to combine names that are similar. All the matching logic has already been done in the underlying SQL code - the SQL resultset is exported to a QVD, and my QVW will retrieve the data from the QVD, and I now need to find a way to group these names together in a graphical way.

Thanks.

MarcoWedel

Hi,

I used the inline load only to get some test data. You can replace it with a QVD load in a real application.

An issue might rather be the performance for large tables / many distinct Names.

You could try to replace the cartesian product (Join of tables without common fields) with a loop. The following example reduced rather the memory footprint of the load instead of the runtime though (again loading/generating some sample data / to replace with QVD load):

QlikCommunity_Thread_215087_Pic6.JPG

table1:

LOAD RecNo() as ID,

    KeepChar(Hash128(RecNo()),'ABCDEFGHIJKLMNOPQRSTUVWXYZ') as NAME

AutoGenerate 10000;

tabNames:

LOAD Distinct NAME as NAME2 Resident table1;

FOR vCnt = 1 to FieldValueCount('NAME2')

LET vName = FieldValue('NAME2',vCnt);

table2:

LOAD *

Where LevDistNAME<=2;

LOAD NAME2,

      '$(vName)' as NAME,

      levenshtein('$(vName)', NAME2) as LevDistNAME

Resident tabNames;

NEXT vCnt

DROP Table tabNames;

Join (table1)

LOAD NAME,

    AutoNumber(Concat(NAME2,'/')) as NameGroup

Resident table2

Group By NAME;

DROP Table table2;

hope this helps

regards

Marco