Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Ak97
Contributor
Contributor

Return full name based on a specific first name

Hi Everyone

I am trying to extract the full name based on certain first names from a huge database. I loaded an excel file which contains all of the names which I need as the first name. I want to use a function which returns the rest of the name if the name is present in the excel file which I load. Is there any function or any way that I could do this? 

 

Below is the script if anyone is curious.

 

LOAD

INSURED_NAME,
INSURED_TYPE,
NATIONAL_ID,
MT_DRIVER_GENDER,
CATEGORY,
SEGMENT_CODE,
BRANCHES,
ISSUE_DATE
    
FROM [C:\Users\moath\Desktop\ACTUARIAL_27B_MTR.qvd](qvd);

LOAD INSURED_NAME,
     
Count
FROM
[C:\Users\moath\Desktop\Test.xlsx]
(
ooxml, embedded labels, table is Sheet1);

Labels (2)
1 Solution

Accepted Solutions
edwin
Master II
Master II

subfield allows you to get a part of a string by specifying a delimeter (space in your case) and a position.
however, your challenge is whether you can define a consistent rule like LAST NAME is the last word in the name (the rule FIRST NAME is the first word in the name does not appear to apply to you)

if you figure (by profiling your data) that last names are inconsistent (there are indeed last names that have two words like San Sebastian - just pulling names from air) then there will be no consistent rule that can apply to all cases.  this will not be a happy ending as you might not get the 100% result you want as there will be conflicting scenarios.  the reason i say this is you will find extreme cases: what is the first name of John Horacio Samuel Smith?  is it John (you will definitely get a match with John) or is it John Horacio?  if you have John Horacio in your firstname list then you should get this match, but if not then he will be called John even if it appears his first name might ne John Horacio.

having said that,

one approach is to do two loops one outer loop is you list of last names, inner loop is your list of first names and do a substring compare - you can use left or some other string function - i would use left so it is easily read.

another approach if your data is huge is make multiple passes (determine first if your list of firstnames will have 2 or 3 max words).  using table operation, match 3-word firstnames first, mark those that are already matched, next match 2-word firstnames, again mark them, then match single word firstnames.  ive attached a possible QVW that assumes the first names are only up to 2 words 

View solution in original post

10 Replies
edwin
Master II
Master II

if insuredname is the full name (assuming format is FIRSTNAME<space>rest of name
you can use the subfield to get the first name (add to your load script)
then left join it with your list of first names (add a flag in your load for first names)

any row with the flag is a match.

you need to profile your list of first names, some first names have two words so you may want to partition them as 1s and 2nd names.  

also make sure the list of firstnames is unique

edwin
Master II
Master II

you can also do it the other way around:
load your list of firstnames and use the exists function in your final load to test if the subfield (calculated firstname) exists in your loaded firstname list and set the flag

Ak97
Contributor
Contributor
Author

What is the subfield exactly? And will this also work if I have full names with middle or four names? E.g some names are written as: 

 

John Smith

John Michael Smith

John Michael Harry Smith

 

 

edwin
Master II
Master II

subfield allows you to get a part of a string by specifying a delimeter (space in your case) and a position.
however, your challenge is whether you can define a consistent rule like LAST NAME is the last word in the name (the rule FIRST NAME is the first word in the name does not appear to apply to you)

if you figure (by profiling your data) that last names are inconsistent (there are indeed last names that have two words like San Sebastian - just pulling names from air) then there will be no consistent rule that can apply to all cases.  this will not be a happy ending as you might not get the 100% result you want as there will be conflicting scenarios.  the reason i say this is you will find extreme cases: what is the first name of John Horacio Samuel Smith?  is it John (you will definitely get a match with John) or is it John Horacio?  if you have John Horacio in your firstname list then you should get this match, but if not then he will be called John even if it appears his first name might ne John Horacio.

having said that,

one approach is to do two loops one outer loop is you list of last names, inner loop is your list of first names and do a substring compare - you can use left or some other string function - i would use left so it is easily read.

another approach if your data is huge is make multiple passes (determine first if your list of firstnames will have 2 or 3 max words).  using table operation, match 3-word firstnames first, mark those that are already matched, next match 2-word firstnames, again mark them, then match single word firstnames.  ive attached a possible QVW that assumes the first names are only up to 2 words 

edwin
Master II
Master II

after posting that, i thought of another way.  you can define possible firstnames:

Name,                                               possibleFirstname1, possibleFirstname2,possibleFirstname3
John Horacio Samuel Smith, John,                                John Horacio,             John Horacio Samuel

your first pass will be left join using lastname, possibleFirstname1 as firstname

your second pass will be left join using lastname, possibleFirstname2 as firstname and so on

3rd pass will not give a hit but 2nd and 1st may - choose the bigger one

edwin
Master II
Master II

btw, wouldnt your huge database of first names have duplicate names?  Joel who lives in the US and Joel who lives in England 

edwin
Master II
Master II

there was a typo in the script, this one is the right one

Ak97
Contributor
Contributor
Author

Yeah it would, but I removed duplicates of names so that the subfield names is unique. 

Ak97
Contributor
Contributor
Author

I used 

SubField(trim(INSURED_NAME),' ',1) as FIRST_NAME,
 
and then extracted all of the names to excel and removed duplicates and then loaded it again with a dummy variable to filter the first names I get which ended up working.