Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

how to join two table by a fuzzy condition

1.JPG

I want to join [tab1] and [tab2] to get result as [out]

but in [tab1], not every x1 is decided by combination of field a and field d, it can be field a and field c or others.

Pls help, thanks a lot

1 Solution

Accepted Solutions
maxgro
MVP
MVP

if i understand your question


RESULT

1.png




SCRIPT

for testing I add a flag to filter your requirement

you can easily modify the script to keep only the records you need (FlagMatch=1) and/or to put catgory in columns instead of rows


tab1:

load

if(len(trim(s1))=0,'?',s1) & if(len(trim(s2))=0,'?',s2) & if(len(trim(s3))=0,'?',s3) & if(len(trim(s4))=0,'?',s4)

& if(len(trim(s5))=0,'?',s5) & if(len(trim(s6))=0,'?',s6) & if(len(trim(s7))=0,'?',s7) as F,

*

;

load * inline [

s1,s2,s3,s4,s5,s6,s7,category

1,,,1,,,,xx1

1,1,1,1,1,,,xx1-1

2,,,,,,,xx2

5,2,,,,,,xx3

];

tab2:

load

if(len(trim(st1))=0,'-',st1) & if(len(trim(st2))=0,'-',st2) & if(len(trim(st3))=0,'-',st3) & if(len(trim(st4))=0,'-',st4)

& if(len(trim(st5))=0,'-',st5) & if(len(trim(st6))=0,'-',st6) & if(len(trim(st7))=0,'-',st7) as F1,

*;

load * inline [

item, st1,st2,st3,st4,st5,st6,st7

item1,1,1,1,1,1,1,2

item2, 2,,,3,1,,,

];

T:

noconcatenate

load * resident tab1;

join load * Resident tab2;

DROP Table tab1, tab2;

Table:

NoConcatenate

load

rowno() as id,

*,

if(WildMatch(F1, F),1,0) as FlagMatch

Resident T

;

DROP Table T;

View solution in original post

13 Replies
MarcoWedel

Can you describe a bit more in detail please.

maybe a longer example table would help

Not applicable
Author

hi, Marco

thank you for your reply.

I made some example tables with more records such as below.

1.JPG
I want the output can get item, st1~st7 in tab2 and category in tab1.

for example, item1, because st1=1 and st4=1, I can get category1=XX1 according to 1st record in tab1.

and then because st1=1 and st2=1 and st3=1 and st4=1 and st5=1 and st6=1, I can get category2=XX1-1 according to 2nd record in tab2

I don't know how to join tab1.qvd and tab2.qvd to get what i want.

maxgro
MVP
MVP

if i understand your question


RESULT

1.png




SCRIPT

for testing I add a flag to filter your requirement

you can easily modify the script to keep only the records you need (FlagMatch=1) and/or to put catgory in columns instead of rows


tab1:

load

if(len(trim(s1))=0,'?',s1) & if(len(trim(s2))=0,'?',s2) & if(len(trim(s3))=0,'?',s3) & if(len(trim(s4))=0,'?',s4)

& if(len(trim(s5))=0,'?',s5) & if(len(trim(s6))=0,'?',s6) & if(len(trim(s7))=0,'?',s7) as F,

*

;

load * inline [

s1,s2,s3,s4,s5,s6,s7,category

1,,,1,,,,xx1

1,1,1,1,1,,,xx1-1

2,,,,,,,xx2

5,2,,,,,,xx3

];

tab2:

load

if(len(trim(st1))=0,'-',st1) & if(len(trim(st2))=0,'-',st2) & if(len(trim(st3))=0,'-',st3) & if(len(trim(st4))=0,'-',st4)

& if(len(trim(st5))=0,'-',st5) & if(len(trim(st6))=0,'-',st6) & if(len(trim(st7))=0,'-',st7) as F1,

*;

load * inline [

item, st1,st2,st3,st4,st5,st6,st7

item1,1,1,1,1,1,1,2

item2, 2,,,3,1,,,

];

T:

noconcatenate

load * resident tab1;

join load * Resident tab2;

DROP Table tab1, tab2;

Table:

NoConcatenate

load

rowno() as id,

*,

if(WildMatch(F1, F),1,0) as FlagMatch

Resident T

;

DROP Table T;

MarcoWedel

source tables for script tests:

tab1:

st1_1st2_1st3_1st4_1st5_1st6_1st7_1category
11XX1
111111XX1-1
2XX2
52XX3
1111XX4
1511XX5
2311XX6
1112XX7
1311XX8
2311XX8-1

tab2:

itemst1st2st3st4st5st6st7
item11111112
item2231
item31111
item4111
item51111
item6111
item72111
item8131
MarcoWedel

Hi Wei,

here is my solution for this interesting task:

I extended your tables a bit (see above):

QlikCommunity_Thread_112457_Pic3.JPG.jpg

QlikCommunity_Thread_112457_Pic4.JPG.jpg

QlikCommunity_Thread_112457_Pic2.JPG.jpg

SET CustomMatch = if(len($1) > 0 AND len($2) > 0, $1=$2, -1);

tab1:

LOAD * FROM [http://community.qlik.com/thread/112457]

(html, codepage is 1252, embedded labels, table is @1);

tab2:

LOAD * FROM [http://community.qlik.com/thread/112457]

(html, codepage is 1252, embedded labels, table is @2);

NoConcatenate

tabTemp:

LOAD Distinct * Resident tab1;

Join

LOAD Distinct * Resident tab2;

NoConcatenate

tabMatches:

LOAD * Resident tabTemp

Where

  $(CustomMatch(st1_1, st1)) AND

  $(CustomMatch(st2_1, st2)) AND

  $(CustomMatch(st3_1, st3)) AND

  $(CustomMatch(st4_1, st4)) AND

  $(CustomMatch(st5_1, st5)) AND

  $(CustomMatch(st6_1, st6)) AND

  $(CustomMatch(st7_1, st7));

DROP Table tabTemp;

Left Join (tab2)

LOAD Distinct

  st1, st2, st3, st4, st5, st6, st7, category

Resident tabMatches;

DROP Table tabMatches;

tab2Temp:

LOAD

    *,

    'category' & CatNum AS CatNumName;

LOAD

  item, st1, st2, st3, st4, st5, st6, st7, category,

    If(Previous(item)=item, Peek(CatNum) + 1, 1) AS CatNum

RESIDENT tab2

ORDER BY item, category;

DROP Field CatNum;

DROP Table tab2;

tabTemp3:

Generic LOAD

  item, st1, st2, st3, st4, st5, st6, st7,

    CatNumName,

    category

RESIDENT tab2Temp;

tab2:

LOAD Distinct

    item, st1, st2, st3, st4, st5, st6, st7

Resident tab2Temp

ORDER BY item;

DROP Table tab2Temp;

FOR i = 0 to NoOfTables()

  TableList:

  LOAD TableName($(i)) as Tablename AUTOGENERATE 1

  WHERE WildMatch(TableName($(i)), 'tabTemp3.*');

NEXT i

FOR i = 1 to FieldValueCount('Tablename')

  LET vTable = FieldValue('Tablename', $(i));

  LEFT JOIN (tab2) LOAD * RESIDENT $(vTable);

  DROP TABLE $(vTable);

NEXT i

DROP TABLE TableList;

This script uses a generic load to create the categoryx fields and a technique described by Rob Wunderlich to combine the generated tables into one: (Qlikview Notes: Use cases for Generic Load)

I also used some custom matching macro on each field to ensure that field values with different lengths will be treated properly.

Hope you have as much fun testing this solution, as I had writing it.

regards

Marco

Not applicable
Author

Hi, Marco

At the very first of your script,  you create a function:

SET CustomMatch = if(len($1) > 0 AND len($2) > 0, $1=$2, -1);

I'm sorry that I'm a totally green hand, that I'm not very sure my understanding about this function.

Could you please teach me what's this function doing in the where condition as below?

  • tabMatches: 
  • LOAD * Resident tabTemp 
  • Where 
  •   $(CustomMatch(st1_1, st1)) AND 
  •   $(CustomMatch(st2_1, st2)) AND 
  •   $(CustomMatch(st3_1, st3)) AND 
  •   $(CustomMatch(st4_1, st4)) AND 
  •   $(CustomMatch(st5_1, st5)) AND 
  •   $(CustomMatch(st6_1, st6)) AND 
  •   $(CustomMatch(st7_1, st7)); 

Thank you very much!

Wei

Not applicable
Author

Thank you very much

This script works.

But could you tell me what's function 'WildMatch' means?

And I have a little worry about that the number of records in result table will be very large if there are many records in tab1 and tab2, it may make the performance bad.

Wei

MarcoWedel

It's a logical expression that is true, when both arguments are equal or a least one is empty.

So

  • Where
  •   $(CustomMatch(st1_1, st1)) AND
  •   $(CustomMatch(st2_1, st2)) AND
  •   $(CustomMatch(st3_1, st3)) AND
  •   $(CustomMatch(st4_1, st4)) AND
  •   $(CustomMatch(st5_1, st5)) AND
  •   $(CustomMatch(st6_1, st6)) AND
  •   $(CustomMatch(st7_1, st7));

is true when all fields are equal to their corresponding fields where defined.


regards


Marco

maxgro
MVP
MVP

- wildmatch, from online help

wildmatch( str, expr1 [ , expr2,...exprN ] )

The wildmatch function performs a case insensitive comparison and permits the use of wildcard characters ( * and ?) in the comparison strings.

Example:

wildmatch( M, 'ja*','fe?','mar')

returns 1 if M = January

returns 2 if M = fex

in your case I used it to match st of tab1 with st of tab2


- performance

the table T is the cartesian product; it could be large, depends on records of tab1 and tab2; you have to try with your real tables and real hardware and let we know if you have some space or performance problem