Skip to main content

Suggest an Idea

Vote for your favorite Qlik product ideas and add your own suggestions.

Request for new BEFOREIMAGE staging tables in Compose for CDC processing

DWH4Fun
Contributor III
Contributor III

Request for new BEFOREIMAGE staging tables in Compose for CDC processing

During Replicate CDC, a record that is updated is captured twice in the landing __CT tables. One record for header__operation = BEFOREIMAGE and one record for header__operation = UPDATE.   However, during Compose CDC processing, the staging tables only collect the UPDATE record.  

There is value in the BEFORIMAGE record in any use case where the upstream system allows an update of a row that changes an entity's HUB key.  When this occurs, it is not possible to determine the prior value in order to take appropriate action in the HUB to soft delete the row.  Clean-up can only occur during the next Full Load since custom logic can compare what is in the HUB table and soft delete what is no longer present in the Full Load landing table.

A simple example: Stock Market Exchange Bank Holidays

Our EXCHANGE_HOLIDAY entity has its HUB key defined as Exchange and Holiday Date.  When a mistake is made and a holiday must be corrected for a given exchange in the upstream system, the UI allows the update of the date on an  existing row.  As a result of CDC processing, the UPDATE record ends up being a new unique HUB row.  We cannot soft delete the prior HUB key because the previous holiday date is unknown at the time of CDC processing.  At this point we must rely on the next Full Load process to correctly soft delete the previous HUB key.  The risk is that the data made available to clients is incorrect until the next Full Load, which is typically the start of the next business day.

In an ideal system, a Data Operator would perform a delete and then insert since the combination of exchange and date is unique even in the upstream system.   Given we have no control over how data is physically managed, we can only react based on the data provided.  Therefore, I would request Compose support a configuration at the entity level (not all entities have such issues) where we can enable BEFORIMAGE records to be captured in their own staging table.  This way, we can make use of custom ETL hooks to interrogate the BEFOREIMAGE rows when needed in order to support soft delete during CDC.  I don't expect Compose to do any processing of BEFORIMAGE records, just capture them for use in custom ETL hooks and clean-up the tables like all other staging tables.

We have a strategic vision to run in CDC only mode without the need of Full Loads outside of initial start-up and cases to resynch when data loss is suspected.  We currently have a dozen such data models that require a Full Load to capture soft deletes.

 

 

2 Comments
Tzachi_Nissim
Employee
Employee

Thank you for your suggestion. I understand the scenario you describe. At this point I do not see us addressing this in Compose anytime soon, but I do suggest that you submit this for Replicate (when a PK changes, write the CT as DELETE+INSERT instead of BEFOREIMAGE+UPDATE). 

Regards,

Tzachi

Status changed to: Open - Not Planned
Meghann_MacDonald

Archiving due to age of idea and relevancy. If it is still needed please resubmit.

Meghann

Status changed to: Closed - Archived