Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Everyone
I am trying to extract the full name based on certain first names from a huge database. I loaded an excel file which contains all of the names which I need as the first name. I want to use a function which returns the rest of the name if the name is present in the excel file which I load. Is there any function or any way that I could do this?
Below is the script if anyone is curious.
LOAD
INSURED_NAME,
INSURED_TYPE,
NATIONAL_ID,
MT_DRIVER_GENDER,
CATEGORY,
SEGMENT_CODE,
BRANCHES,
ISSUE_DATE
FROM [C:\Users\moath\Desktop\ACTUARIAL_27B_MTR.qvd](qvd);
LOAD INSURED_NAME,
Count
FROM
[C:\Users\moath\Desktop\Test.xlsx]
(ooxml, embedded labels, table is Sheet1);
subfield allows you to get a part of a string by specifying a delimeter (space in your case) and a position.
however, your challenge is whether you can define a consistent rule like LAST NAME is the last word in the name (the rule FIRST NAME is the first word in the name does not appear to apply to you)
if you figure (by profiling your data) that last names are inconsistent (there are indeed last names that have two words like San Sebastian - just pulling names from air) then there will be no consistent rule that can apply to all cases. this will not be a happy ending as you might not get the 100% result you want as there will be conflicting scenarios. the reason i say this is you will find extreme cases: what is the first name of John Horacio Samuel Smith? is it John (you will definitely get a match with John) or is it John Horacio? if you have John Horacio in your firstname list then you should get this match, but if not then he will be called John even if it appears his first name might ne John Horacio.
having said that,
one approach is to do two loops one outer loop is you list of last names, inner loop is your list of first names and do a substring compare - you can use left or some other string function - i would use left so it is easily read.
another approach if your data is huge is make multiple passes (determine first if your list of firstnames will have 2 or 3 max words). using table operation, match 3-word firstnames first, mark those that are already matched, next match 2-word firstnames, again mark them, then match single word firstnames. ive attached a possible QVW that assumes the first names are only up to 2 words
if insuredname is the full name (assuming format is FIRSTNAME<space>rest of name
you can use the subfield to get the first name (add to your load script)
then left join it with your list of first names (add a flag in your load for first names)
any row with the flag is a match.
you need to profile your list of first names, some first names have two words so you may want to partition them as 1s and 2nd names.
also make sure the list of firstnames is unique
you can also do it the other way around:
load your list of firstnames and use the exists function in your final load to test if the subfield (calculated firstname) exists in your loaded firstname list and set the flag
What is the subfield exactly? And will this also work if I have full names with middle or four names? E.g some names are written as:
John Smith
John Michael Smith
John Michael Harry Smith
subfield allows you to get a part of a string by specifying a delimeter (space in your case) and a position.
however, your challenge is whether you can define a consistent rule like LAST NAME is the last word in the name (the rule FIRST NAME is the first word in the name does not appear to apply to you)
if you figure (by profiling your data) that last names are inconsistent (there are indeed last names that have two words like San Sebastian - just pulling names from air) then there will be no consistent rule that can apply to all cases. this will not be a happy ending as you might not get the 100% result you want as there will be conflicting scenarios. the reason i say this is you will find extreme cases: what is the first name of John Horacio Samuel Smith? is it John (you will definitely get a match with John) or is it John Horacio? if you have John Horacio in your firstname list then you should get this match, but if not then he will be called John even if it appears his first name might ne John Horacio.
having said that,
one approach is to do two loops one outer loop is you list of last names, inner loop is your list of first names and do a substring compare - you can use left or some other string function - i would use left so it is easily read.
another approach if your data is huge is make multiple passes (determine first if your list of firstnames will have 2 or 3 max words). using table operation, match 3-word firstnames first, mark those that are already matched, next match 2-word firstnames, again mark them, then match single word firstnames. ive attached a possible QVW that assumes the first names are only up to 2 words
after posting that, i thought of another way. you can define possible firstnames:
Name, possibleFirstname1, possibleFirstname2,possibleFirstname3
John Horacio Samuel Smith, John, John Horacio, John Horacio Samuel
your first pass will be left join using lastname, possibleFirstname1 as firstname
your second pass will be left join using lastname, possibleFirstname2 as firstname and so on
3rd pass will not give a hit but 2nd and 1st may - choose the bigger one
btw, wouldnt your huge database of first names have duplicate names? Joel who lives in the US and Joel who lives in England
there was a typo in the script, this one is the right one
Yeah it would, but I removed duplicates of names so that the subfield names is unique.
I used