Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm using the code below to take a first and last name from one table (135K records)
and search 2.15 million records in a 2nd table for the first and last name to be in a field called FullName.
I tried building a crossreference table and it was too much for the server so resorted to doing it one by one
but it is taking forever (estimating 30 hours) at about one sec per search. Also the DO WHILE LOOP code has some kind of bug becasue at the end of my sample runs, the next command is skipped (with or without a semicolon after LOOP)
WL_CANDIDATES:
LOAD rowNum, WL_ID, fName, lName, lenfName
FROM ..\QVD\WL_FirstLastCandidates.QVD (qvd);
CUST_DATA:
LOAD IND_PARTY_KEY, FullName
FROM ..\QVD\CUST_DATA_SEARCH.QVD (qvd);
LET MAX = NoOfRows('WL_CANDIDATES');
SET lowRnge=0;
DO WHILE lowRnge<MAX
LET fName = Peek('fName',$(lowRnge),'WL_CANDIDATES');
LET lName = Peek('lName',$(lowRnge),'WL_CANDIDATES');
LET wlID = Peek('WL_ID',$(lowRnge),'WL_CANDIDATES');
HITLIST:
LOAD
IND_PARTY_KEY,
Num($(wlID)) as WL_ID,
'Y' as matchFound
Resident CUST_DATA
WHERE index(FullName,'$(fName)')>0
AND index(Mid(FullName,index(FullName,'$(fName)') + 1 + Len('$(fName)')),'$(lName)')>0;
LET lowRnge=lowRnge+1;
LOOP
Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;
STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;
// Interesting anomaly: The first time I ran this I had a lot of code after the loop, so I put an “exit script” after the LOOP.
// When it ran, the exit script command was completely ignored and it proceeded to run the rest of the code.
// I then moved this code to the end. I still needed to drop a temp table so initially the last command was: Drop Table WL_CANDIDATES;
// I added a few more commands as shown below in the hopes that if it skipped over the first one, the nextwould actually execute.
Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;
STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;
// Unfortunately, I never got to see this work as the process never completed again. It ran for 5 hours and completed around 3000 of the 135,502 records.
Can anyone please recommend more efficient code that would do this quicker?
That's true, the script will find any combinations of search values.
I don't think that Mapsubstring allows wildcards, but you can try a two step approach, something like this:
WL_CANDIDATES:
LOAD *, recno() as CandID INLINE [
fName, lName
A, Meyer
A, Lee
Amit, Hubar
David, John
];
MAP1:
MAPPING LOAD
fName, '\'&fName&'/'
Resident WL_CANDIDATES
CUST_DATA:
LOAD *, recno() as CustID
INLINE [
FullName
Mr Alexander Lee
Ami Glee
Happy Lee
Amit Chin Hubarendi
Kamit Hubar
Davidson Johnson
];
Match:
LOAD *,
Textbetween(Match1,'\','/') as fName,
Index(Match1,'/')+1 as Index;
LOAD *, Mapsubstring('MAP1',FullName) as Match1
RESIDENT CUST_DATA
WHERE len(Mapsubstring('MAP1',FullName));
LEFT JOIN (Match)
LOAD fName, lName as Search2
Resident WL_CANDIDATES;
FinalMatch:
LOAD CustID, 'Yes' as Match
RESIDENT Match
WHERE Index(FullName,Search2) >Index;
DROP TABLE Match;
Still, since the MapSubstring will map against the first matching key that is found, you may miss a matching combination of fName and lName to FullName if there is a fName match earlier in your mapping table. So I think you need to sort your candidates fName most significant first (i.e. by len(fName) desc).
You can also try if Rob's cookbook solution for a generated WildMatch() is working for you:
http://qlikviewcookbook.com/download/mapping-with-wildcards/
Another option would be to full outer join fName, lName table with FullName table, then do a WildMatch to find all matches. This will create a larger temporary table, but should still be much better performing than your loop (on a decent well equipped environment).
I think the best solution you will get with a JScript macro. In a macro you can use regular expressions, it requires only few resources and it's very fast.
In the script you have to export the necessary fields to csv files, calling the macro and load the result:
STORE WL_CANDIDATES into [.\WL_CANDIDATES.csv](txt, delimiter is ';');
STORE FULLNAME into [.\FULLNAME.csv](txt);
let ret=MatchData('$(QvWorkPath)');
Left Join (CUST_DATA)
LOAD FullName,
ID,
'Yes' as Match
FROM [.\FULLNAME_Match.csv](txt, utf8, embedded labels, delimiter is ';', msq);
The JScript macro:
function MatchData(path){
var inStream = new ActiveXObject("ADODB.Stream");
inStream.CharSet = "utf-8";
inStream.Open;
inStream.LoadFromFile(path+"\\WL_CANDIDATES.csv");
inStream.skipline();
var s,sa,i;
var regex=[];
while (!inStream.eos){
s = inStream.ReadText(-2);
sa=s.split(";");
regex[Number(sa[0])]=new RegExp(sa[1]+".* "+sa[2]);
}
var outStream = new ActiveXObject("ADODB.Stream");
outStream.CharSet = "utf-8";
outStream.Open;
outStream.WriteText("FullName;ID",1);
inStream.LoadFromFile(path+"\\FULLNAME.csv");
inStream.skipline();
while (!inStream.eos){
s = inStream.ReadText(-2);
for (i = 0; i < regex.length; i++) {
if(s.match(regex) != null){
outStream.WriteText(s+";"+i,1);
break;
}
}
}
inStream.Close();
outStream.SaveToFile(path+"\\FULLNAME_Match.csv",2);
outStream.Close();
return 0
}
Hi,
another (very slow either) solution might be:
tabWL_CANDIDATES:
LOAD *,
'* '&FirstName&'* '&LastName&'*' as WildMatchStr
FROM
(qvd); tabCUST_DATA:
LOAD *,
' '&FullName as FullName2
FROM
(qvd); tabLink:
LOAD Distinct * Where not IsNull(WildMatchStr);
LOAD ID2,
If(FullName2 like FieldValue('WildMatchStr',IterNo()),FieldValue('WildMatchStr',IterNo())) as WildMatchStr
Resident tabCUST_DATA
While IterNo()<=FieldValueCount('WildMatchStr');
Join
LOAD ID1,
WildMatchStr
Resident tabWL_CANDIDATES;
DROP Fields WildMatchStr, FullName2;
maybe it helps nevertheless
regards
Marco