Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am loading some Customer IDs in from 2 seperate databases and I want to identify which Customer ID's are contained in each table and which are not.
Hwo can I create flag script side so that I could correctly identify 1,3,5 as being duplicated but 2, 4 are not?
Cus_ID |
---|
1 |
2 |
3 |
4 |
5 |
Cus_ID |
---|
1 |
3 |
5 |
Thanks
Stuart
There are different approaches possible, it may depend on the final data model you want to achieve.
For example:
LOAD Cus_ID
'Table1' as Source
FROM Table1Source;
CONCATENATE
LOAD Cus_ID,
'Table2' as Souce,
If(Exists(Cus_ID),1) as DuplicateFlag
FROM Table2Source;
This will flag the duplicates, but only in the second table records (still you are able to filter the Cus_ID accordingly).
Cust:
Load
Cus_ID , Flag
1,A
2,A
.
Concatenate
Cus_ID , Flag
1,B
3,B
5,B
Then Flag can be use in set analysis.
Vikas
There are different approaches possible, it may depend on the final data model you want to achieve.
For example:
LOAD Cus_ID
'Table1' as Source
FROM Table1Source;
CONCATENATE
LOAD Cus_ID,
'Table2' as Souce,
If(Exists(Cus_ID),1) as DuplicateFlag
FROM Table2Source;
This will flag the duplicates, but only in the second table records (still you are able to filter the Cus_ID accordingly).
Hi
Try This.
Copy the same to your script and reload.
Table_Missing:
LOAD * Inline [
Cust_Id
1
3
5
];
Table_All:
LOAD if(Exists(Cust_Id),1,0) as Flag, * Inline [
Cust_Id
1
2
3
4
5
];
Thanks Stefan - is there any way to extended this method so that you can see the dupes on both sets of records?
As said, there are multiple possible solution, e.g one without exists():
Table:
LOAD Cus_ID
'Table1' as Source
FROM Table1Source;
CONCATENATE
LOAD Cus_ID,
'Table2' as Souce
FROM Table2Source;
LEFT JOIN (Table)
LOAD Cus_ID,
If(Count(DISTINCT Source)>1, 'Duplicate','No Duplicate') as Flag
RESIDENT Table
GROUP BY Cus_ID;
thanks - simple but genius