Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I would be grateful if someone can see what I'm missing here. I'll explain the scenario:
(given the LastCommitDatestamp values which flow from Replicate, it would appear that the first line there is the original record, and the second line is the current state of the record in the source)
My questions:
Looking forward to learning from y'all about how to deal with this type of situation!
Qlik Compose for Data Warehouses Qlik Compose Qlik Replicate
Hi,
In the Change Table you only see the activity that occurred in DB Log. The way Replicate works with Compose is to send the full / initial load - which would have the original record.
If you skipped this - then you would not see the 'original insert'... You are only getting records from the time you start the Replicate task.
Since your source does updates via a DELETE/INSERT pair - that is what you see in the change table.
In general, Compose ignore deletes. (Just because a record is deleted does not mean it should be removed from the data warehouse as its still applicable for analytics - e.g. archival processes that run on source systems, or a product you DID sell, but don't anymore still should be available for historical analysis in the DW).
In order to manage history / updates to data, Compose needs some type of 'logical key'. If you are using an RRN - then this is why you are seeing these as 'duplicates'. Due to the DELETE/INSERT on the source instead of update - the RRN is changeing.
The issue here is you have data with no discernable 'logical key' which is updated with delete+insert - any downstream process (ignore even Compose) - would need to have some form of logical key in order to update the target.
I'm curious - you state "I can run a post-load custom script to just delete all but the newest record".
How would you handle that delete ? If its possible that there are legitimate duplicates in the dataset (based on using every business column you have) - how would you determine the 'latest' record that should be kept or deleted ?
If you can explain that, we can think about the best way to solve this.