Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
sifatnabil
Specialist
Specialist

Incremental load: how to avoid duplicate rows?

What is the surefire way of avoiding duplicate rows when an incremental load refreshes more than once a day? I am currently using this algorithm but noticed it's not always working:

If qvdcreatetime('QVDlocation') Then

Table1:

LOAD *;

FROM [QVDlocation] qvd;

else

Table1:

LOAD

Autonumberhash25(Order_ID,DATETIME) as primarykey,

Date,

Order,

etc.

FROM SQL statement

end if

Concatenate (Table1)

LOAD DISTINCT

Autonumberhash256(Order_ID,DATETIME) as primarykey,

Date,

Order,

etc.

FROM SQL statement WHERE DATETIME>sysdate

Here, DATETIME is an Oracle datatype "Date with timestamp", in the format "DDMMYYYY HH:MM:SS"

Shouldn't the autonumberhash256 work to prevent duplicate rows?

3 Replies
Gysbert_Wassenaar

autonumberhash256 merely creates a hash value. It does not check for existing values. For that you need a where clause.


concatenate(Table1)

LOAD Autonumberhash256(Order_ID,DATETIME) as primarykey, ...etc...

WHERE Not Exists(primarykey,Autonumberhash256(Order_ID,DATETIME));

SELECT .... FROM ... WHERE DATETIME>sysdate;


If QVDlocation does not exist then records are loaded from an sql source. The next statement will concatenate records to that. If those records come from the same sql source then you'll likely be loading some records twice.


talk is cheap, supply exceeds demand
sifatnabil
Specialist
Specialist
Author

Thanks Gysbert, you mentioned:

If those records come from the same sql source then you'll likely be loading some records twice.


For my case, the same SQL source is being used. Are you saying some records will load twice even with the WHERE Not Exists(primarykey) clause?


EDIT: I'm also getting an SQL error saying "Field not found: primarykey"

sifatnabil
Specialist
Specialist
Author

Just tested this again and can see that if I use some other field, the SQL error doesn't happen, e.g. WHERE NOT EXISTS (ORDER_ID).

But using primarykey where primarykey = autonumberhash256(ORDER_ID,DATE_TIME) still causes the SQL error "field not found: primary key". Any ideas?