Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
pcrocker
Contributor
Contributor

Incremental Load

Hello all, 

I am attempting to use incremental load to speed up my data load process but I am having an issue with one of my transaction tables. This is a table that gains new lines but never updates or deletes lines. When I run the following script (edited to remove specific information) I find that I am missing some of the data, and there doesn't appear to be any clear pattern as to which lines are missed. Does there appear to be anything wrong with this script, or does anybody have an idea as to why some data may be missing? Thank you.

TABLE_NAME:
SQL SELECT PRIMARY_KEY,
                          DATE_TIME,
                           a few other columns here
    FROM  source_table
    WHERE DATE_TIME >'$(vUpdateDate)';

Concatenate LOAD * FROM '(QVDFilePath).qvd' (qvd);

STORE TABLE_NAME INTO '(QVDFilePath).qvd' (qvd);

[UpdateDate]:
Load max(DATE_TIME) as UpdateDate
resident TABLE_NAME;

Let vUpdateDate = Timestamp(Peek('UpdateDate',0,'vUpdateDate'),'YYYY-MM-DD hh:mm:ss[.fff]');

DROP TABLE TABLE_NAME;

1 Solution

Accepted Solutions
pcrocker
Contributor
Contributor
Author

Thank you for the suggestions.  The issue was with the source database, there are two dates fields that are used seemingly interchangeably by the business so I needed to add the second one as a condition to the query. In addition I changed the date conditions to be >= and added a where not exists statement to the concatenate as it turns out these tables do update (but do not delete). 

View solution in original post

3 Replies
Or
MVP
MVP

Perhaps I missed something, but I don't see anything in your code that actually deals with lines that were updated or deleted. You're loading any lines that were updated since your previous run, and concatenating that with all lines that existed last time without testing if anything changed.

For updates, it seems that you would need to test the QVD load with a NOT EXISTS for the primary key (to avoid loading the same line twice if it previously existed but was updated since the last load).

I'm not sure how you'd expect to tell that a line was deleted from the source table. You only load lines that have changed since your previous load, meaning there is no full list of lines to check against and see if a line was deleted.

marcus_sommer

If you are missing records randomly it could be that those records are locked and/or have changes within their access rights and/or the query ends by any other access or by any timeout (quite often there would be an error - but this mustn't happens).

If always certain records are missing it could be that there is any conflict with the capabilities/settings of the used driver.

Further you could loose records which were created/accessible to the current day or backwards after the last execution of load. To avoid it you may change the where clause from > to >= and adding also one to the qvd-load of <. With it you would loose the optimized mode but with some efforts you could switch this logic to a where exists().

IMO better would be you use your PRIMARY_KEY as filter for the where exists().

- Marcus

pcrocker
Contributor
Contributor
Author

Thank you for the suggestions.  The issue was with the source database, there are two dates fields that are used seemingly interchangeably by the business so I needed to add the second one as a condition to the query. In addition I changed the date conditions to be >= and added a where not exists statement to the concatenate as it turns out these tables do update (but do not delete).