Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Sorting similar names for display

Hello,

I have instances of IDs and Names where the name values are very similar - in some cases, they match exactly with one or more other instance (it may be 1 other, 2 other, unlimited other instances) - the IDs are unique. A sample of the data is below:

  

IDNAME
123456A- Name
123457A Name
123458A+ Name
111410ABC Co - 2006-11
111411ABC Co - 2006-12
111418ABC Co - 2007-11
111420ABC Co - 2007-11
198220XYZ - City A
198227XYZ - City A-
199200MMM Co.
199210MMM Co.
199220MMM Co.

How can I extract these IDs and Names, group the Names together by their other instances and display them in a way that clearly shows the grouped instances together (perhaps in a pivot table)? For example, I would like to display the 3 instances of 'MMM Co.' grouped together so the user knows all 3 instances are together.

Many thanks

6 Replies
settu_periasamy
Master III
Master III

May be like this?

T1:

LOAD * INLINE [

    ID, NAME

    123456, A- Name

    123457, A Name

    123458, A+ Name

    111410, ABC Co - 2006-11

    111411, ABC Co - 2006-12

    111418, ABC Co - 2007-11

    111420, ABC Co - 2007-11

    198220, XYZ - City A

    198227, XYZ - City A-

    199200, MMM Co.

    199210, MMM Co.

    199220, MMM Co.

];

T2:

Left Join(T1)

LOAD NAME,if(Count(NAME)>1,'Yes','No') as Flag Resident T1 Group by NAME;

Capture.JPG

sunny_talwar

MMM Co. was straight forward because it was an exact match? Are you also looking to combine XYZ - City A and XYZ - City A- Since they are almost very close?

MarcoWedel

Hi,

one solution could be to calculate a group number field (e.g. using the Levenshtein algorithm as described here: Levenshtein Algorithm) and to use this field to group and/or colour code NAME values using a straight or pivot table:

QlikCommunity_Thread_215087_Pic1.JPG

QlikCommunity_Thread_215087_Pic2.JPG

QlikCommunity_Thread_215087_Pic3.JPG

QlikCommunity_Thread_215087_Pic4.JPG

QlikCommunity_Thread_215087_Pic5.JPG

table1:

LOAD * INLINE [

    ID, NAME

    123456, A- Name

    123457, A Name

    123458, A+ Name

    111410, ABC Co - 2006-11

    111411, ABC Co - 2006-12

    111418, ABC Co - 2007-11

    111420, ABC Co - 2007-11

    198220, XYZ - City A

    198227, XYZ - City A-

    199200, MMM Co.

    199210, MMM Co.

    199220, MMM Co.

];

table2:

LOAD Distinct NAME Resident table1;

Join

LOAD Distinct NAME as NAME2 Resident table1;

Join

LOAD NAME, NAME2, levenshtein(NAME, NAME2) as LevDistNAME Resident table2;

Right Join

LOAD Distinct LevDistNAME Resident table2 Where LevDistNAME<=2;

Join (table1)

LOAD NAME,

     AutoNumber(Concat(NAME2,'/')) as NameGroup

Resident table2

Group By NAME;

DROP Table table2;

hope this helps

regards

Marco

Anonymous
Not applicable
Author

Hi Marco,

I am VERY interested in your solution - never heard of the Levenshtein algorithm before - but before I try implementing your suggestion, the Name fields I provided are just examples - in reality, the Names won't be hardcoded in an Inline Load, they'll be retrieved from a QVD, so I'll have no idea of the actual names nor how many there are - will your suggestion still work in this case?

Thanks so much.

Anonymous
Not applicable
Author

Yes Sunny, I am looking to combine names that are identical and I am looking to combine names that are similar. All the matching logic has already been done in the underlying SQL code - the SQL resultset is exported to a QVD, and my QVW will retrieve the data from the QVD, and I now need to find a way to group these names together in a graphical way.

Thanks.

MarcoWedel

Hi,

I used the inline load only to get some test data. You can replace it with a QVD load in a real application.

An issue might rather be the performance for large tables / many distinct Names.

You could try to replace the cartesian product (Join of tables without common fields) with a loop. The following example reduced rather the memory footprint of the load instead of the runtime though (again loading/generating some sample data / to replace with QVD load):

QlikCommunity_Thread_215087_Pic6.JPG

table1:

LOAD RecNo() as ID,

    KeepChar(Hash128(RecNo()),'ABCDEFGHIJKLMNOPQRSTUVWXYZ') as NAME

AutoGenerate 10000;

tabNames:

LOAD Distinct NAME as NAME2 Resident table1;

FOR vCnt = 1 to FieldValueCount('NAME2')

LET vName = FieldValue('NAME2',vCnt);

table2:

LOAD *

Where LevDistNAME<=2;

LOAD NAME2,

      '$(vName)' as NAME,

      levenshtein('$(vName)', NAME2) as LevDistNAME

Resident tabNames;

NEXT vCnt

DROP Table tabNames;

Join (table1)

LOAD NAME,

    AutoNumber(Concat(NAME2,'/')) as NameGroup

Resident table2

Group By NAME;

DROP Table table2;

hope this helps

regards

Marco