Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
What is the surefire way of avoiding duplicate rows when an incremental load refreshes more than once a day? I am currently using this algorithm but noticed it's not always working:
If qvdcreatetime('QVDlocation') Then
Table1:
LOAD *;
FROM [QVDlocation] qvd;
else
Table1:
LOAD
Autonumberhash25(Order_ID,DATETIME) as primarykey,
Date,
Order,
etc.
FROM SQL statement
end if
Concatenate (Table1)
LOAD DISTINCT
Autonumberhash256(Order_ID,DATETIME) as primarykey,
Date,
Order,
etc.
FROM SQL statement WHERE DATETIME>sysdate
Here, DATETIME is an Oracle datatype "Date with timestamp", in the format "DDMMYYYY HH:MM:SS"
Shouldn't the autonumberhash256 work to prevent duplicate rows?
autonumberhash256 merely creates a hash value. It does not check for existing values. For that you need a where clause.
concatenate(Table1)
LOAD Autonumberhash256(Order_ID,DATETIME) as primarykey, ...etc...
WHERE Not Exists(primarykey,Autonumberhash256(Order_ID,DATETIME));
SELECT .... FROM ... WHERE DATETIME>sysdate;
If QVDlocation does not exist then records are loaded from an sql source. The next statement will concatenate records to that. If those records come from the same sql source then you'll likely be loading some records twice.
Thanks Gysbert, you mentioned:
If those records come from the same sql source then you'll likely be loading some records twice.
For my case, the same SQL source is being used. Are you saying some records will load twice even with the WHERE Not Exists(primarykey) clause?
EDIT: I'm also getting an SQL error saying "Field not found: primarykey"
Just tested this again and can see that if I use some other field, the SQL error doesn't happen, e.g. WHERE NOT EXISTS (ORDER_ID).
But using primarykey where primarykey = autonumberhash256(ORDER_ID,DATE_TIME) still causes the SQL error "field not found: primary key". Any ideas?