Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to figure out the most efficient way to delete rows after an incremental load. This is a common enough question but I have not found anyone with the same data conditions that I have.
The data I’m loading (from a sql server database) is several hundred million rows. Due to the size of the data I do not want to do an INNER JOIN back to the source table to determine deleted rows.
However our database tracks the primary keys of deleted rows with a timestamp, and my Qlik load script is already tracking the last load timestamp, so with a simple sql query I can get the list of rows deleted since my last incremental load. I’d like to use that list – which will likely be very small (typically less than a dozen rows) – and delete them from my loaded data.
Is there any way to do this? It would definitely put a much smaller load on my sql server database.
I'm assuming you store the records you load in a qvd file and that you use an incremental load procedure to update that qvd with the new, changed and deleted record. If that's the case you can first load the primary keys of the deleted records from the SQL database and use the Exists() function in a WHERE clause of the load of the data from the qvd.
DeletedRecords: SELECT PrimaryKey FROM DeletedRecords; MyTable: SELECT PrimaryKey,... other fields ... FROM BigSQLTable WHERE ModififcationDate > $(TimestampOfPreviousIncrLoad); CONCATENATE (MyTable)
LOAD PrimaryKey,... other fields ... FROM MyHistoricalQVD.qvd (qvd) WHERE NOT Exists(PrimaryKey);
STORE MyTable INTO MyHistoricalQVD.qvd (qvd);