Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello !
A typical problem for me is like this :
Say that you have a clients database with a CITY ADDRESS field.
Then you have another table with all of the CITIES NAMES . A table from the Email Company. All the valid city names.
I want to merge two tables creating a new field , CITY STATUS, that will be OK if the client CITY ADDRESS is found on the CITIES NAMES table. And I want status to be ERROR if the CITY ADDRESS is NOT FOUND on the other table.
How to do that ? Lookup is not a god solution cause the database is huge.
I have provided a sample application.
Thanks !
One approach:
LEFT JOIN (CLIENTS)
LOAD DISTINCT NAME_MAIL_CITY AS CITY_CLIENT
,'OK' as TEMP_CITY_STATUS
RESIDENT MAIL_CITY
;
LEFT JOIN (CLIENTS)
LOAD
CODE_CLIENT
,if(TEMP_CITY_STATUS='OK','OK','ERROR') AS CITY_STATUS
RESIDENT CLIENTS
;
DROP FIELD TEMP_CITY_STATUS
;
Another approach:
CITYMAP:
MAPPING LOAD
DISTINCT NAME_MAIL_CITY
,'OK' AS CITY_STATUS
RESIDENT MAIL_CITY
;
CLIENTS:
LOAD *
,applymap('CITYMAP',CITY_CLIENT,'ERROR') AS CITY_STATUS
FROM TEST158P.CSV (ANSI, TXT, DELIMITER IS ';', EMBEDDED LABELS, NO QUOTES);
Another approach:
CLIENTS:
LOAD *
,if(exists(NAME_MAIL_CITY,CITY_CLIENT),'OK','ERROR') AS CITY_STATUS
FROM TEST158P.CSV (ANSI, TXT, DELIMITER IS ';', EMBEDDED LABELS, NO QUOTES);
You'd have to experiment to see which is fastest on your actual data. I'm betting #3, and it's the simplest too.
One approach:
LEFT JOIN (CLIENTS)
LOAD DISTINCT NAME_MAIL_CITY AS CITY_CLIENT
,'OK' as TEMP_CITY_STATUS
RESIDENT MAIL_CITY
;
LEFT JOIN (CLIENTS)
LOAD
CODE_CLIENT
,if(TEMP_CITY_STATUS='OK','OK','ERROR') AS CITY_STATUS
RESIDENT CLIENTS
;
DROP FIELD TEMP_CITY_STATUS
;
Another approach:
CITYMAP:
MAPPING LOAD
DISTINCT NAME_MAIL_CITY
,'OK' AS CITY_STATUS
RESIDENT MAIL_CITY
;
CLIENTS:
LOAD *
,applymap('CITYMAP',CITY_CLIENT,'ERROR') AS CITY_STATUS
FROM TEST158P.CSV (ANSI, TXT, DELIMITER IS ';', EMBEDDED LABELS, NO QUOTES);
Another approach:
CLIENTS:
LOAD *
,if(exists(NAME_MAIL_CITY,CITY_CLIENT),'OK','ERROR') AS CITY_STATUS
FROM TEST158P.CSV (ANSI, TXT, DELIMITER IS ';', EMBEDDED LABELS, NO QUOTES);
You'd have to experiment to see which is fastest on your actual data. I'm betting #3, and it's the simplest too.
I prefer #3. It's the simplest one.
On the prototype it ran very fast.
I'll transport to the original application to see how fast or slow it goes.
Thanks !
Indeed #3 ( if exists approach) is the best one.
Simple to understand and fast to run.
Thanks !