Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
tschullo
Creator III
Creator III

Search 2 million records for names from a second source table

I'm using the code below to take a first and last name from one table (135K records)

and search 2.15 million records in a 2nd table for the first and last name to be in a field called FullName.

I tried building a crossreference table and it was too much for the server so resorted to doing it one by one

but it is taking forever (estimating 30 hours) at about one sec per search. Also the DO WHILE LOOP code has some kind of bug becasue at the end of my sample runs, the next command is skipped (with or without a semicolon after LOOP)

WL_CANDIDATES:
LOAD rowNum,      WL_ID,      fName,      lName,      lenfName
FROM ..\QVD\WL_FirstLastCandidates.QVD (qvd);

CUST_DATA:
LOAD IND_PARTY_KEY,     FullName
FROM ..\QVD\CUST_DATA_SEARCH.QVD (qvd);

LET MAX = NoOfRows('WL_CANDIDATES');
SET lowRnge=0;

DO WHILE lowRnge<MAX

LET fName = Peek('fName',$(lowRnge),'WL_CANDIDATES');
LET lName = Peek('lName',$(lowRnge),'WL_CANDIDATES');
LET wlID = Peek('WL_ID',$(lowRnge),'WL_CANDIDATES');

HITLIST:
LOAD
IND_PARTY_KEY,
Num($(wlID)) as WL_ID,
'Y' as matchFound
Resident CUST_DATA
WHERE index(FullName,'$(fName)')>0
AND index(Mid(FullName,index(FullName,'$(fName)') + 1 + Len('$(fName)')),'$(lName)')>0;

LET lowRnge=lowRnge+1;

LOOP

Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;

STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;

 

// Interesting anomaly: The first time I ran this I had a lot of code after the loop, so I put an “exit script” after the LOOP.
// When it ran, the exit script command was completely ignored and it proceeded to run the rest of the code.
// I then moved this code to the end.  I still needed to drop a temp table so initially the last command was:
Drop Table WL_CANDIDATES;

// I added a few more commands as shown below in the hopes that if it skipped over the first one, the nextwould actually execute.

 

Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;

STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;

 

// Unfortunately, I never got to see this work as the process never completed again. It ran for 5 hours and completed around 3000 of the 135,502 records.

 

  Can anyone please recommend more efficient code that would do this quicker?

13 Replies
swuehl
MVP
MVP

That's true, the script will find any combinations of search values.

I don't think that Mapsubstring allows wildcards, but you can try a two step approach, something like this:

WL_CANDIDATES:

LOAD *, recno() as CandID INLINE [

fName,         lName

A,   Meyer         

A,             Lee

Amit,          Hubar

David,         John

];

MAP1:

MAPPING LOAD

fName, '\'&fName&'/'

Resident WL_CANDIDATES

CUST_DATA:

LOAD *, recno() as CustID

INLINE [                        

FullName                               

Mr Alexander Lee                       

Ami Glee

Happy Lee

Amit Chin Hubarendi                    

Kamit Hubar

Davidson Johnson

];

Match:

LOAD *,

Textbetween(Match1,'\','/') as fName,

Index(Match1,'/')+1 as Index;

LOAD *, Mapsubstring('MAP1',FullName) as Match1

RESIDENT CUST_DATA

WHERE len(Mapsubstring('MAP1',FullName));

LEFT JOIN (Match)

LOAD fName, lName as Search2

Resident WL_CANDIDATES;

FinalMatch:

LOAD CustID, 'Yes' as Match

RESIDENT Match

WHERE Index(FullName,Search2) >Index;

DROP TABLE Match;

Still, since the MapSubstring will map against the first matching key that is found, you may miss a matching combination of fName and lName to FullName if there is a fName match earlier in your mapping table. So I think you need to sort your candidates fName most significant first (i.e. by len(fName) desc).

swuehl
MVP
MVP

You can also try if Rob's cookbook solution for a generated WildMatch() is working for you:

http://qlikviewcookbook.com/download/mapping-with-wildcards/

Another option would be to full outer join fName, lName table with FullName table, then do a WildMatch to find all matches. This will create a larger temporary table, but should still be much better performing than your loop (on a decent well equipped environment).

cwolf
Creator III
Creator III

I think the best solution you will get with a JScript macro. In a macro you can use regular expressions, it requires only few resources and it's very fast.

In the script you have to export the necessary fields to csv files, calling the macro and load the result:

STORE WL_CANDIDATES into [.\WL_CANDIDATES.csv](txt, delimiter is ';');

STORE FULLNAME into [.\FULLNAME.csv](txt);

let ret=MatchData('$(QvWorkPath)');

Left Join (CUST_DATA)

LOAD FullName,

     ID,

     'Yes' as Match

FROM [.\FULLNAME_Match.csv](txt, utf8, embedded labels, delimiter is ';', msq);

The JScript macro:

function MatchData(path){

    var inStream = new ActiveXObject("ADODB.Stream");

    inStream.CharSet = "utf-8";

    inStream.Open;

    inStream.LoadFromFile(path+"\\WL_CANDIDATES.csv");

    inStream.skipline();

    var s,sa,i;

    var regex=[];

    while (!inStream.eos){

        s = inStream.ReadText(-2);

        sa=s.split(";");

        regex[Number(sa[0])]=new RegExp(sa[1]+".* "+sa[2]);

    }

    var outStream = new ActiveXObject("ADODB.Stream");

    outStream.CharSet = "utf-8";

    outStream.Open;

    outStream.WriteText("FullName;ID",1);

    inStream.LoadFromFile(path+"\\FULLNAME.csv");

    inStream.skipline();

    while (!inStream.eos){

        s = inStream.ReadText(-2);

        for (i = 0; i < regex.length; i++) {

            if(s.match(regex) != null){

                outStream.WriteText(s+";"+i,1);

                break;

            }

        }

    }

    inStream.Close();

    outStream.SaveToFile(path+"\\FULLNAME_Match.csv",2);

    outStream.Close();

    return 0

}

MarcoWedel

Hi,

another (very slow either) solution might be:

QlikCommunity_Thread_255465_Pic1.JPG

tabWL_CANDIDATES:

LOAD *,

    '* '&FirstName&'* '&LastName&'*' as WildMatchStr

FROM (qvd);

tabCUST_DATA:

LOAD *,

    ' '&FullName as FullName2

FROM (qvd);

tabLink:

LOAD Distinct * Where not IsNull(WildMatchStr);

LOAD ID2,

    If(FullName2 like FieldValue('WildMatchStr',IterNo()),FieldValue('WildMatchStr',IterNo())) as WildMatchStr  

Resident tabCUST_DATA

While IterNo()<=FieldValueCount('WildMatchStr');

Join

LOAD ID1,

    WildMatchStr

Resident tabWL_CANDIDATES;

DROP Fields WildMatchStr, FullName2;

maybe it helps nevertheless

regards

Marco