Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Jan 20, 2022 10:22:24 AM
Jul 9, 2020 10:50:04 PM
Data warehouses are optimized to store historical data for complex analytics and reporting. Historical data can include transactional history (e.g. store all transactions for seven years), or slowly changing dimensions attribute history (e.g. manage changes to specific attributes like name or address, over time). Consequently, we rarely purge data from the data warehouse.
However, operational systems that feed the data warehouse, frequently delete records. Some systems allow for complete record removal, while others impose restrictions. As a result, the data warehouse can experience side effects of record deletion form the source systems.
In short, managing record deletion in a data warehouse is complicated, and this paper describes the best practices for handling various data deletion scenarios with Qlik Data Integration.
<<UPDATED 3/5/2021 with additional comments on how to handle data marts>>
Do you have an elegant way of detecting if the delete occurred due to normal activity (as in the order was truly deleted) versus purging? (It sounds like time frame would be a way to detect - as in if the order is within the last 6 months or maybe shorter and it is deleted, then its a TRUE delete).
If that's the case, you can use a single-table ETL or pull in the 'D' records that have an OrderDate within the past X months. Now, I don't normally recommend deleting data from your warehouse - in case there is an issue, but if you must - then you could use a single or multi-table ETL to read the 'D' records in the CT table and either delete or mark as deleted.