Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
umashankarus
Contributor III
Contributor III

Group by a column based on fuzzy match

Hi Experts

I have a table with several data columns.

And one of the field is a string type. (This field is feed from a data source where free form description allowed)

SOURCE_TABLE:

F1, F2, F3

ABC , 100, $20000

abc Inc., 220, $25000

Abc Plastics, 230, $23000

3M, 100, 12000

3M Care, 175, 6500

And the real data contains 100s of variations of F1

In this example, I want the resultant table with F1 and F3, grouped by F1

RESULTANT_TABLE:

F1, F2

ABC, $68000

3M, $18500

Can the resultant F1 be stored in a mapping table

and

add resultant F1 as a column to the source table through apply map based on 'fuzzy' match ?

If so, can you pl share sample script ?

Also pl suggest other options to achieve this ?

Thanks,

Umashankar

1 Solution

Accepted Solutions
ahaahaaha
Partner - Master
Partner - Master

Hi Umashankar,

As an variant

//load the source table

TABLE:

LOAD * Inline [

F1, F2, F3

ABC , 100, $20000

abc Inc., 220, $25000

Abc Plastics, 230, $23000

3M, 100, 12000

3M Care, 175, 6500];

NoConcatenate

SOURCE_TABLE:

LOAD

F1,

Lower(SubField(F1, ' ', 1)) as F1Key, //create key

F2,

Money#(F3, '$#') as F3                //convert money format

Resident TABLE;

Left Join

LOAD

F1Key,

Money(Sum(F3), '$ #,##') as SumF3     //convert money format

Resident SOURCE_TABLE

Group By F1Key;

DROP Table TABLE;

//Then you can remove the unneeded fields from the resulting table

//DROP Fields ... From SOURCE_TABLE;

Result

1.jpg

Regards,

Andrey

View solution in original post

13 Replies
Anonymous
Not applicable

You have to create a lookup table for values in the F1 and the value it should take in the resultant table. Can  use Applymap and create a table and group by.

TABLE:

load * inline [

F1, F2, F3

ABC , 100, $20000

abc Inc., 220, $25000

Abc Plastics, 230, $23000

3M, 100, 12000

3M Care, 175, 6500];

//Creating a mapping table

lookup:

Mapping

load * inline

[

F1,F1_value

ABC,ABC

abc Inc.,ABC

Abc Plastics,ABC

3M,3M

3M Care,3M];

FinalTable:

LEFT join(TABLE)

load F1,ApplyMap('lookup',F1) AS RES_F1 Resident TABLE;

lEFT JOIN(TABLE)

LOAD RES_F1,SUM(F3)

Resident TABLE

GROUP BY RES_F1;

Hope it helps!!

ahaahaaha
Partner - Master
Partner - Master

Hi Umashankar,

As an variant

//load the source table

TABLE:

LOAD * Inline [

F1, F2, F3

ABC , 100, $20000

abc Inc., 220, $25000

Abc Plastics, 230, $23000

3M, 100, 12000

3M Care, 175, 6500];

NoConcatenate

SOURCE_TABLE:

LOAD

F1,

Lower(SubField(F1, ' ', 1)) as F1Key, //create key

F2,

Money#(F3, '$#') as F3                //convert money format

Resident TABLE;

Left Join

LOAD

F1Key,

Money(Sum(F3), '$ #,##') as SumF3     //convert money format

Resident SOURCE_TABLE

Group By F1Key;

DROP Table TABLE;

//Then you can remove the unneeded fields from the resulting table

//DROP Fields ... From SOURCE_TABLE;

Result

1.jpg

Regards,

Andrey

umashankarus
Contributor III
Contributor III
Author

Thanks for the reply

Since F1 variations are more,  we would need dynamic expression to handle

umashankarus
Contributor III
Contributor III
Author

Thanks for the reply

The suggestion helps

Additionally, if there are data variations / spelling mismatch (within Subfield)....is there a way to handle by means of fuzzy match within the expression ?

example data for F1

BOTICELLI

BOTTICELLI Inc

BOTICELI Brands

ahaahaaha
Partner - Master
Partner - Master

For this it is necessary to apply a mapping table. Look at this, please here  Mapping ‒ QlikView

and ApplyMap ‒ QlikView

In my view try to identify errors in the words automatically is not very reasonable. If two words differ by one or more letters and have different meanings - is not always clear, it is a mistake or not.

Regards,

Andrey

Anonymous
Not applicable

I think . Unless you have a solid lookup table, we cant always trust the numbers you get. Its hard to find out each case where data anomaly occurs and use some function to group it.

ahaahaaha
Partner - Master
Partner - Master

You still want to automatically link

BOTICELLI

BOTTICELLI Inc

BOTICELI Brands  in a single value?

umashankarus
Contributor III
Contributor III
Author

Yes, I would like to link in a single value

Thanks