Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, kindly advise on how can I get the output in ADDRESS_2 IN ADDRESS_1 column in Table 2 which check whether each row in Table 2 Address is in ADDRESS_1 column in Table 1 and if exist, then display as Yes else display as No.
I have tried to use SUBSTRINGCOUNT() and it works but as my data in both tables are quite large (More than 2 million records), therefore it will consume a lot of memory and the table will take a long time to load.
Can advise whether there are other alternatives besides using SUBSTRINGCOUNT() for large data sets.
Thank you.
Table 1
ADDRESS_1 |
Address A |
Address B |
Address C |
Address D |
Table 2
ADDRESS_2 | ADDRESS_2 IN ADDRESS_1 |
Address 1 | No |
Address A | Yes |
Address 2 | No |
Address 3 | No |
Address B | Yes |
I think you should link the 2 tables by duplicating the ADDRESS_1 and ADDRESS_2 fields and aliasing the 2nd ADDRESS_1 as ADDRESS and the 2nd ADDRESS_2 as ADDRESS
Table1:
LOAD
ADDRESS_1,
ADDRESS_1 as ADDRESS
FROM <wherever>
Table2:
LOAD
ADDRESS_2,
ADDRESS_2 as ADDRESS,
ADDRESS_2 in ADDRESS_1
FROM <wherever>
Then a link will form on ADDRESS. Any common ADDRESS in the 2 tables will render 'yes'.
test:
mapping load Address1,'yes' as flag;
load * inline [
Address1
AB
AC
AD
];
load *,applymap('test',Address2,'No') as newfield;
load * inline [
Address2
A1
AB
A2
A3
AC
];
It could be per exists(), like:
t1: load ADDRESS1 from X;
t2: load ADDRESS2, pick(-exists(ADDRESS1, ADDRESS2) + 1, 'no', 'yes') as Flag
from Y;
and will perform quite fast.