Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
kanonkop
Contributor
Contributor

Match Contents of two different Strings

Good Day,

I am trying to compare contents of one string value in one table against the contents of another string value in a different table. Based on the matched values it must pick the best matched value.

For example TABLE 1 has multiple values (String 1) separated by ','

TABLE 1
KeyString 1
0001428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710
0002428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715
0003428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS
0004428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS


A second table with some string values also...

TABLE 2
String 2Value
42820
428,45922
428,459,71025
428,431,459,71030
428,431,49333
428,431,45938


What I am trying to accomplish is to compare the 2 string fields in some way, I have tried breaking it down using subfield, so there are multiple records and then use the subfield values as key, but it does not return what I want as it links it to all associated string values in table 2. I have tried the use of Match, Wildmatch ect, but probably use it inefficiently.

What must happen is the string value in TABLE 1 must compare all the available options in the string field of TABLE 2 and choose the best matched value in TABLE 2.

Example of the results of above table contents. 

  • The 1st record matched 428,459,710 as the best result in table 2, as all those entries are present in the string field in table 1.
  • Similar for the 2nd record, only 428 & 459 was present in the string field of TABLE 2.
  • ......
RESULT
KeyString 1String 2Value
0001428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710428,459,71025
0002428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715428,45922
0003428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS428,431,45938
0004428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS42820

 

Hoping that someone can assist or guide me in some direction, or that my requirement at least make some sense. hehe

Thanks and Regards,

Kanon

1 Solution

Accepted Solutions
tresesco
MVP
MVP

If I haven understood right, you can do this in script. I have given a quick try, please find below the script.

 

Table1:
Load * Inline [
Key, "String 1"
0001, "428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710"
0002, "428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715"
0003, "428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
0004, "428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
];

Join

Table2:
Load *
	,SubField([String 2], ',') as SepString2;
Load * Inline [
"String 2", Value
"428", 20
"428,459", 22
"428,459,710", 25
"428,431,459,710", 30
"428,431,493", 33
"428,431,459", 38
];

NoConcatenate
Final:
Load 
	[String 1],
	FirstSortedValue(DISTINCT Key, -len([String 2])) as Key,
	FirstSortedValue(DISTINCT [String 2], -len([String 2])) as [String 2],
	FirstSortedValue(DISTINCT Value, -len([String 2])) as Value

Where bestMatchFlag = 'PossibleBestMatch' Group By [String 1];


Load
	Key,
	[String 1],
	[String 2],
	Value,
	If(Count(if(matchFlag=1,1))=SubStringCount([String 2],',')+1, 'PossibleBestMatch', 'NO') as bestMatchFlag
Group By  [String 2],[String 1],Key, Value;	

Load
	Key,
	[String 1],
	[String 2],
	Value,
	If(Index([String 1],SepString2)>0,1,0) as matchFlag
Resident Table1;

DROP Table Table1;
	 

 

 Capture.PNG

 

Better solution could be there though. Not invested much time in search of one.😋

View solution in original post

8 Replies
kanonkop
Contributor
Contributor
Author

Howdy...

Tried a few things again,  but still not 100% success, will something like this even be possible within Qlik?
I will continue trying to find a solution though... hehe

jpenuliar
Partner - Specialist III
Partner - Specialist III

Hi,
The example below have managed to return the Keys from Table1 with matches to the strings in Table 2, you only need to add the "Value" to Table3 and use it as look up for Table1:

TempTable2:
Load * Inline [
"String 2", Value
"428", 20
"428,459", 22
"428,459,7102", 25
"428,431,459,710", 30
"428,431,493", 33
"428,431,459", 38
];

NoConcatenate
Table2:
Load
Text(Subfield("String 2",',')) as "String 1"
Resident TempTable2;

Drop Table TempTable2;


Table1:
Load * Inline [
Key, "String 1"
0001, "428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710"
0002, "428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715"
0003, "428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
0004, "428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
];


Inner Join(Table2)
Load
Key,
Text(Subfield("String 1",',')) as "String 1"
Resident Table1;

Drop Table Table1;

NoConcatenate
Table3:
Load Key, Concat(distinct "String 1",', ') as "String 1" Resident Table2
Group By Key;

Drop Table Table2;
kanonkop
Contributor
Contributor
Author

Good Day,

Thank you so much for your reply...

Unfortunately this is the similar results that I am getting which is not yet correct.

With the results below from your example, the matching in theory is correct, however if you look for example at Key 0004 with the value of "428,493" and then look at TempTable2, there is no string value of "428,493". There is however a "428,431,493", which for this instance would be incorrect as well, as the "431" was not present in Table1.

Therefore due to 431 not being available for that record, looking at TempTable2, the only available match would be the 1st entry... "428" with the value of 20 as all the other entries miss one or more string code.

Preview.PNG

Best Regards,

Kanon

jpenuliar
Partner - Specialist III
Partner - Specialist III

wouldn't it be possible to add an "if no match found, then ..." logic after the the step we have done so far?

kanonkop
Contributor
Contributor
Author

I don't think so, as the table gets broken down with subfield, so there is no relation to match against. It's almost like I have to add additional key to associate each record on TempTable2 before subfield.

Then find the best logical match, if one of the values in TempTable2 row entry is not present in one of the records of Table1, it is not a match, and then look for another logical match in TempTable2, as in the one example row where the only logical match was "428" on its own, even though some of the other entries were found, not all was present for that record against the record in TempTable2.

Hehe, feels so close yet sooooooo...

kanonkop
Contributor
Contributor
Author

Good Day,

If this is not possible in load script, would it be possible to do this with set analysis?

Applying some of the logic suggested, this is an example of one of the results.

Only one value must be returned, in the 1st example, the 1st record is the correct one, as the second record has a "823" value in string 2, which is not present in string 1.

The second example even though the value 3AC is present in the 1st record, the 2nd record is more accurate as the value "823" is present in string 1 as well, therefore the 1st record should be ignored.

Preview_01.PNG

Regards

tresesco
MVP
MVP

If I haven understood right, you can do this in script. I have given a quick try, please find below the script.

 

Table1:
Load * Inline [
Key, "String 1"
0001, "428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710"
0002, "428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715"
0003, "428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
0004, "428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
];

Join

Table2:
Load *
	,SubField([String 2], ',') as SepString2;
Load * Inline [
"String 2", Value
"428", 20
"428,459", 22
"428,459,710", 25
"428,431,459,710", 30
"428,431,493", 33
"428,431,459", 38
];

NoConcatenate
Final:
Load 
	[String 1],
	FirstSortedValue(DISTINCT Key, -len([String 2])) as Key,
	FirstSortedValue(DISTINCT [String 2], -len([String 2])) as [String 2],
	FirstSortedValue(DISTINCT Value, -len([String 2])) as Value

Where bestMatchFlag = 'PossibleBestMatch' Group By [String 1];


Load
	Key,
	[String 1],
	[String 2],
	Value,
	If(Count(if(matchFlag=1,1))=SubStringCount([String 2],',')+1, 'PossibleBestMatch', 'NO') as bestMatchFlag
Group By  [String 2],[String 1],Key, Value;	

Load
	Key,
	[String 1],
	[String 2],
	Value,
	If(Index([String 1],SepString2)>0,1,0) as matchFlag
Resident Table1;

DROP Table Table1;
	 

 

 Capture.PNG

 

Better solution could be there though. Not invested much time in search of one.😋

kanonkop
Contributor
Contributor
Author

Thank you so much, I have so much to learn still. I have tested your code and the results is perfect, I am implementing the code in my application for further testing.

I managed to get it working with below code, please don't laugh, then again you might as well... hehe, was going to ask if doing so many resident loads is optimal, but guess my lack of knowledge is probably not the best way to ask. 🙂

Spoiler

TempTable2:
Load * Inline [
"String 2", Value
"428", 20
"428,459", 22
"428,459,710", 25
"428,431,459,710", 30
"428,431,493", 33
"428,431,459", 38
];

[TABLE]:
load *,
text([String 2]) as String2,
text(SubField([String 2],',')) as StrKey,
substringcount(trim([String 2]), ',') + 1 as Count_Str2
resident TempTable2;

drop field [String 2];
drop table TempTable2;

TempTable1:
Load * Inline [
ID, "String 1"
0001, "428,459,5A1,5AC,5AL,5DM,609,650,688,6AC,6AE,6AK,6AM,6AN,6AP,6CP,6NS,710"
0002, "428,459,481,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS,715"
0003, "428,431,459,481,498,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
0004, "428,432,453,482,493,4AW,4ML,4NE,4U0,4UR,534,548,552,5A1,5AC,6AP,6CP,6NS"
];

left join([TABLE])

Load *,
text([String 1]) as String1,
text(SubField([String 1],',')) as StrKey
Resident [TempTable1];

drop field [String 1];
drop field StrKey;
drop table TempTable1;

[TEMP]:
load *,
if(len(trim([String1]))<>0,text(SubField([String1],','))) AS [String_1],
if(len(trim([String2]))<>0,text(SubField([String2],','))) AS [String_2]
Resident TABLE;

drop table TABLE;

[MATCH]:
load *,
if([String_1] = [String_2],1,0) as MATCHED
resident TEMP order by ID, [String_1];

Drop table TEMP;

NoConcatenate
[Grr]:
load distinct *,
Text(ID & String2) as KEY Resident [MATCH] where MATCHED > 0;
drop table [MATCH];

Right Join

load KEY,
sum(MATCHED) as Please
resident [Grr]
group by KEY;

NoConcatenate
[Dude]:
load distinct * resident Grr where Please = Count_Str2
order by ID, Please desc;

drop table Grr;

[AiAiAi]:
load *,
IF(Rowno() = 1,1,
IF(ID = Previous(ID),0,1)) as FirstValue
resident Dude;

drop table Dude;

NoConcatenate
[ThisIsIt]:
load distinct ID,String1,String2,Value resident [AiAiAi] where FirstValue > 0;

drop table AiAiAi;

Again thanks for your time in helping here, much appreciated.

Kanon