Qlik Community

Qlik Compose for Data Warehouses

Discussion board for collaboration on Qlik Compose for Data Warehouses.

Announcements
June 28, 10AM ET: Qlik Nation and Qlik Community present: CyberSleuth REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
fanny_betd
Partner
Partner

DWH CDC not able to manage multiple changes in source

When applying multiple changes in the same source (here table named: produits), without re-run the DWH_CDC between each changes, an error occurs:

Capture d’écran 2021-02-11 à 10.52.43.png

Log details: "Load job aborted - Unexpected Primary Key Violations detected in produits"

Capture d’écran 2021-02-11 à 10.53.04.png

To get around this issue, we had to delete all inserted lines in produits_ct and re-run the DWH CDC, then no more issue.

We've made the test:

  1. apply price change from x1 to x2
  2. apply price change from x2 to x3
  3. run the DWH_CDC

> crash

We were expecting that compose will either handle both modifications, or at least the last one. But the screened error above occurs, seems that compose does not handle any case when multiple changes are done.

We then tried to:

  1. apply price change from x1 to x2
  2. run the DWH_CDC
  3. apply price change from x2 to x3
  4. run the DWH_CDC

> no crash

Does anyone have any recommendations to manage this situation ?

Thank you

Labels (1)
1 Solution

Accepted Solutions
TimGarrod
Employee
Employee

Hi @fanny_betd  -   Compose DOES support processing multiple changes in the CT table. 

When you are processing __CT data with Compose there can always be multiple changes for the same "key" value.    Any CDC ETL Set should have the "Handle Duplicates" checkbox enabled for the mappings in that ETL set.   This tells Compose that there could be duplicate records for each key with different FROM DATES (FD).   (Note by default Compose will use the header__timestamp provided by Replicate for the FD column). 

 

See screenshot - notice that the "handle duplicates" is checked for all mappings in my "CDC" etl set.  (I have not checked these in the full load ETL set. 

TimGarrod_0-1615989689185.png

 

 

Also - be aware that the default behavior in Compose is to track and manage EVERY change from the source.  However, if you only wish to keep the latest change for a given day - you can adjust the"Default History Resolution" setting for each ETL Set (thus allowing you to determine which groups of mappings you wish to manage every change v only the last change).

TimGarrod_1-1615989817995.png

 

Hope that helps!

View solution in original post

5 Replies
barretthowe
Contributor
Contributor

You can accomplish CDC with continuous streaming for real-time updates, or asynchronously by looking at the source periodically and replicating the changed data.

TimGarrod
Employee
Employee

Hi @fanny_betd  -   Compose DOES support processing multiple changes in the CT table. 

When you are processing __CT data with Compose there can always be multiple changes for the same "key" value.    Any CDC ETL Set should have the "Handle Duplicates" checkbox enabled for the mappings in that ETL set.   This tells Compose that there could be duplicate records for each key with different FROM DATES (FD).   (Note by default Compose will use the header__timestamp provided by Replicate for the FD column). 

 

See screenshot - notice that the "handle duplicates" is checked for all mappings in my "CDC" etl set.  (I have not checked these in the full load ETL set. 

TimGarrod_0-1615989689185.png

 

 

Also - be aware that the default behavior in Compose is to track and manage EVERY change from the source.  However, if you only wish to keep the latest change for a given day - you can adjust the"Default History Resolution" setting for each ETL Set (thus allowing you to determine which groups of mappings you wish to manage every change v only the last change).

TimGarrod_1-1615989817995.png

 

Hope that helps!

Shannon36
Contributor
Contributor

CDC can bite you, REALLY hard, on prod.

CDC will lock vlf's in the transaction log, until the data is written over to the cdc "target". In essence, if your cdc stops working, your log backups will not be able to "clean" your transaction log, and after a couple of hours, to a couple of days, depending on your environment and workload, your production database server will .... die.

TimGarrod
Employee
Employee

Hi @Shannon36  - this is not entirely accurate.   Replicate has 2 options to help manage TLOG truncation -  sp_repldone (which could cause what you've stated) or starting a transaction in a dummy table.   

In the second instance, if the Replicate task is down - there is no transaction / lock on the tlog and your backups can checkpoint and thus "truncate" the transaction log.   When Replicate resumes it will read MSDB and determine where the LSN is that it is looking for.   So long as the backup is a native sql backup and still available on the server in some capacity, Replicate should be able to read from the backup log. 

If you are having this issue you describe above, I recommend opening a support ticket to get help on configuring Replicate to prevent hits issue from happening.     Quite frankly if Replicate is down for a couple of days - you are likley better off doing a reload than trying to navigate days worth of activity in TLogs (depending on your volume of changes of course).

 

Additionally if this is a problem you have often - there is also the option to use LogStream where the Replicate capture task is not impacted by the delivery task.  Replicate will capture and store the transactions on its server and a secondary task reads that "LogStream" to send to the target.   Now you need enough storage space on the Replicate server to house the data for whatever your down period is, but it means the capture of transactions has occured and is not impacted by a target being down.    There's no free lunch on this type of problem though.  If you are doing transactional replication and you have a prolonged out of a couple of days due to your target being down - something has to give somewhere.  Just know there are ways to mitigate impact to production databases.

 

Hope that helps 🙂

Price7466
Contributor
Contributor


@fanny_betd wrote:

When applying multiple changes in the same source (here table named: produits), without re-run the DWH_CDC between each changes, an error occurs:

Capture d’écran 2021-02-11 à 10.52.43.png

Log details: "Load job aborted - Unexpected Primary Key Violations detected in produits"

Capture d’écran 2021-02-11 à 10.53.04.png

To get around this issue, we had to delete all inserted lines in produits_ct and re-run the DWH CDC, then no more issue.

We've made the test:

  1. apply price change from x1 to x2
  2. apply price change from x2 to x3
  3. run the DWH_CDC

> crash

We were expecting that compose will either handle both modifications, or at least the last one. But the screened error above occurs, seems that compose does not handle any case when multiple changes are done.

We then tried to:

  1. apply price change from x1 to x2
  2. run the DWH_CDC
  3. apply price change from x2 to x3
  4. run the DWH_CDC

> no crash

Does anyone have any recommendations to manage this situation ?

Thank you


Everything is very open with a clear description of the issues. It was truly informative. Your website is very helpful. Many thanks for sharing!