I have loaded 2 tables with exactly the same columns: TABLE_A and TABLE_B.
I want to be able to view the rows that are in TABLE_B but not in TABLE_A. The 2 tables are very large (around 160 million rows) and there are around 5 million extra rows in TABLE_B so the most efficient way of doing this would be very helpful. Just being able to see some of the rows that are different would still be helpful.
I have had a look at your 'Test Join' file and it almost looks like what I need. However, instead of seeing matching rows I need to see the rows that don't match. How should I do this?
The qvw I suggested contains a lisbox named "Flag"...select "Table 1 rows" or "Table 2 rows" and you will see only rows that belongs to Table 1 or Table 2.
Another way to achieve your needs is to use not exists in your load statement.
Will your example also work if I am doing a SQL SELECT from a DB? Something like:
'QVD' as Source
'DB' as Source
WHERE NOT EXISTS(Key);
I am still having trouble with this. When I execute the above script I get a SQL Select error. I don't want the 'Where not exists' to be executed in the DB, it should be done once the data has been loaded when joining to Table_1. Is there another way to do this or have I not understood the 'Where not Exists' correctly?
When you fetch the data from your sql database then then it looks like as
SQL SELECT *
FROM CIIS.dbo."REF_OFFICE_LOCATION" WHERE NOT EXISTS(yourfield);
Assuming your tables have a primary key and you want to load all the data from both tables. You can create a flag field that indicates if the record from table B is a duplicate:
sql select PrimaryKeyField, field2, field3, ...., 'source1' as source' from sourcetable1;
load *, if(exists(PrimaryKeyField),1,0) as FlagDuplicate;
sql select PrimaryKeyField, field2, field3, ...., 'source2' as source' from sourcetable2;
You can then select 1 in a listbox that has FlagDuplicate as field to find the records that are duplicates, or select 0 to find the records from table B that are not a duplicate.