Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
tschullo
Creator III
Creator III

Search 2 million records for names from a second source table

I'm using the code below to take a first and last name from one table (135K records)

and search 2.15 million records in a 2nd table for the first and last name to be in a field called FullName.

I tried building a crossreference table and it was too much for the server so resorted to doing it one by one

but it is taking forever (estimating 30 hours) at about one sec per search. Also the DO WHILE LOOP code has some kind of bug becasue at the end of my sample runs, the next command is skipped (with or without a semicolon after LOOP)

WL_CANDIDATES:
LOAD rowNum,      WL_ID,      fName,      lName,      lenfName
FROM ..\QVD\WL_FirstLastCandidates.QVD (qvd);

CUST_DATA:
LOAD IND_PARTY_KEY,     FullName
FROM ..\QVD\CUST_DATA_SEARCH.QVD (qvd);

LET MAX = NoOfRows('WL_CANDIDATES');
SET lowRnge=0;

DO WHILE lowRnge<MAX

LET fName = Peek('fName',$(lowRnge),'WL_CANDIDATES');
LET lName = Peek('lName',$(lowRnge),'WL_CANDIDATES');
LET wlID = Peek('WL_ID',$(lowRnge),'WL_CANDIDATES');

HITLIST:
LOAD
IND_PARTY_KEY,
Num($(wlID)) as WL_ID,
'Y' as matchFound
Resident CUST_DATA
WHERE index(FullName,'$(fName)')>0
AND index(Mid(FullName,index(FullName,'$(fName)') + 1 + Len('$(fName)')),'$(lName)')>0;

LET lowRnge=lowRnge+1;

LOOP

Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;

STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;

 

// Interesting anomaly: The first time I ran this I had a lot of code after the loop, so I put an “exit script” after the LOOP.
// When it ran, the exit script command was completely ignored and it proceeded to run the rest of the code.
// I then moved this code to the end.  I still needed to drop a temp table so initially the last command was:
Drop Table WL_CANDIDATES;

// I added a few more commands as shown below in the hopes that if it skipped over the first one, the nextwould actually execute.

 

Trace wlID = $(wlID);
Drop Table WL_CANDIDATES;

STORE HITLIST INTO ..\QVD\HITLIST.QVD (QVD);
DROP TABLES HITLIST, CUST_DATA;

 

// Unfortunately, I never got to see this work as the process never completed again. It ran for 5 hours and completed around 3000 of the 135,502 records.

 

  Can anyone please recommend more efficient code that would do this quicker?

13 Replies
Anonymous
Not applicable

hi!

just to be clear, you have the table [WL_CANDIDATES] and you want to lookup the fname and lname in [CUST_DATA] and create the field [Full Name] and store this into the [HITLIST] table?

Have you tried doing joins or mapping table?

tschullo
Creator III
Creator III
Author

I'm taking a first and last name from one table (135K records)

and searching 2.15 million records in a 2nd table for the first and last name to be in a field called FullName.

There is no key to join on.

In the end I end up with HITLIST table that connects a WL_ID to an IND_PARTY_ID with a matchFound='Y' so I can filter cust data by matches found.

swuehl
MVP
MVP

It's an absolute performance killer to iterate over a larger table and query the record values using PEEK(). Always try to avoid these kind of loops.

You should be able to use a mapping table with your search names and MapSubstring() in your CustData table to filter your data, something along what is demonstrated in e.g.

Re: Wildcard search need help urgent

tschullo
Creator III
Creator III
Author

I thought about that, but then my search logic is a bit more complicated than a simple wildcard search.

I "Capitalize" the fName, lName and FullName ahead of time

then search for fName in FullName case sensitive

I then search for lName to be in remainder of string after where fName was found, also case sensitive

The results are really good.

Is there a better way to run my query without PEEK?

Should I query the smaller table for one row, then query the large?

swuehl
MVP
MVP

To demonstrate your complications, could you post some sample lines of (mock-up) data for both tables?

tschullo
Creator III
Creator III
Author

WL_CANDIDATES:

fName             lName          

A                      Lee

Amit                Hubar

David              John

CUST_DATA:                                HITLIST:

FullName                                       matchFound

Mr Alexander Lee                          Y

Ami Glee

Happy Lee

Amit Chin Hubarendi                     Y

Kamit Hubar

Davidson Johnson                         Y

tschullo
Creator III
Creator III
Author

And of course if the last name preceeds the first name that is a fail (no match)

swuehl
MVP
MVP

Something like

WL_CANDIDATES:

LOAD * INLINE [

fName,         lName         

A,             Lee

Amit,          Hubar

David,         John

];

MAP1:

MAPPING LOAD

fName, '\'&1&'/'

Resident WL_CANDIDATES;

MAP2:

MAPPING LOAD

lName, '\'&2&'/'

Resident WL_CANDIDATES;

CUST_DATA:

LOAD *, If(KeepChar(Mapped,'\/12')  ='\1/\2/', 'Yes') as Match;

LOAD *, MapSubString('MAP2',MapSubString('MAP1',FullName)) as Mapped

INLINE [                        

FullName                               

Mr Alexander Lee                       

Ami Glee

Happy Lee

Amit Chin Hubarendi                    

Kamit Hubar

Davidson Johnson

];

tschullo
Creator III
Creator III
Author

Hmmm, very nice! The only problem I see, unless I read this wrong, is that you are matching a mix of all last names and first names. the two must be used exclusively together.