Solved: Switching from Direct Replication Path to Logstrea... - Qlik Community

TomaszRomanowski · ‎2024-10-01

Hello Qlik Community,

I have an environment with many large tables that are replicated using CDC. The source is MSSQL (using default MS-REPLICATION) and the target is Oracle. I want to switch this replication to use Logstream because I plan to add extra targets.

When I attempt to switch to using Logstream (via Run → Resume Processing), I encounter the following error:

E: Position does not exist [1002510] (at_audit_reader.c:642)

I understand that this error has already been discussed in the forum. Therefore, I want to use Run → Advanced Run Options to specify Date and Time.

My key question is: How can I determine the correct Date and Time to avoid any duplicates or gaps in the data?

I attempted to use the table attrep_status, which has a column source_timestamp_applied. According to the documentation:

"This is the timestamp of the last transaction committed. In a bulk apply, this will be the timestamp for the commit of the last transaction in that batch. It will only be changed as part of the last transaction in the batch."

Therefore I stop my job, reconfigure source endpoint (to use logstream), before setting the Date and Time, I check the value of source_timestamp_applied and add 1 second. And I run job.

Am I doing this correctly? Is there a better procedure to avoid duplicates and gaps in the data without reloading data?

Thank you for your assistance.

john_wang · ‎2024-10-06

Hello @TomaszRomanowski ,

In general startup from LSN is more accurate than TIMESTAMP because there are chance multiple transactions occurs in the same second. However manually startup task from both LSN and TIMESTAMP cannot guarantee no duplicates nor gaps, unless there are other factors to meet your needs. This is completely different from resuming a task. The 'RESUME' task function does not apply to the behavior described below.

This is not Qlik Replicate defect or bad design, but it's RDBMS common behavior. Assume we have below transactions in source DB:

LSN	Transactions	Transaction Timestamp
100	(trxID=1) UPDATE TABLE ... A	01:00:00 PM
200	(trxID=2) INSERT INTO TABLE B ...	02:00:00 PM
300	(trxID=2) COMMIT	03:00:00 PM
400	Changes applied & Task STOP	04:00:00 PM
500	(trxID=1) COMMIT	05:00:00 PM

Assume the task stopped after TrxID=2 was applied on the target side. Now, if we restart the task:

If we start from the 4PM timestamp: The transaction is incomplete, and Replicate will discard it, causing TrxID=1 to be lost.
If we start from LSN 400: The same issue occurs. To avoid data loss, Qlik Replicate should start from LSN 100 or the 1PM timestamp ("begin Transaction" of trxID=1). However, in doing so, TrxID=2 will be processed twice, leading to duplicate records on the target.

In short, it’s difficult to ensure no duplicates or gaps by simply restarting from a specific stream position. To handle this:

If possible, start the task earlier than TrxID=2 to capture the complete transaction. For more information please check SOURCE_TAIL_POSITION and SOURCE_TAIL_TIMESTAMP. As Desmond said, Without losing any changes, you should use the SOURCE_TAIL_POSITION or SOURCE_TAIL_TIMESTAMP from the attrep_status table.
If the table has a Primary Key (PK) or Unique Index (UI), set the task to UPSERT mode to prevent duplicates. You may get warning: Source changes that would have had no impact were not applied to the target database. Refer to the 'attrep_apply_exceptions' table for details. This is expected behavior.
If the table has not Primary Key (PK) or Unique Index (UI), or you are using Store Changes table, you will get duplicates in target tables.

I hope this clarifies the process.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

PGN · ‎2024-10-01

I had a similar circumstance a few months back. However, my target was another SQL Server, so I was able to use the tablediff.exe available from Microsoft. I don't believe it works with sources/targets other than MS-SQL.

My approach was to wait until there was no active transactions and use the resume task from whenever the log stream task started.

I don't believe you'll get duplicates since Qlik checks the PK. And I believe the only way to check for gaps would be some type of mechanism like tablediff or Redgate. A tablecount isn't necessarily accurate as it won't account for data differences.

TomaszRomanowski · ‎2024-10-02

Thank you for your reply.

However, in an environment with a large number of substantial tables running 24/7, it becomes complicated to verify if the data is synchronized.

>I don't believe you'll get duplicates since Qlik checks the PK.

In the case of inserts, if duplicates occur, I will have rows in the exceptions table. But what happens with updates and deletes? This could lead to data inconsistency.

Therefore, I would like to know if I can use the attrep_status table and the source_timestamp_applied column to ensure that the data is in sync. I'm certain that someone else has encountered the same issue as I have.

Any insights or suggestions would be greatly appreciated.

DesmondWOO · ‎2024-10-02

Hi @TomaszRomanowski ,

Without losing any changes, you should use the SOURCE_TAIL_POSITION or SOURCE_TAIL_TIMESTAMP from the attrep_status table. These two columns store the oldest start transaction that is still not committed. I would suggest setting the value a bit earlier.

Since duplicates occur, you may encounter duplicate record error (INSERT) and record not found (UPDATE/DELETE) error, but these error will be written into the attrep_apply_exceptions table, depends on your task settings.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

john_wang · ‎2024-10-06

Hello @TomaszRomanowski ,

In general startup from LSN is more accurate than TIMESTAMP because there are chance multiple transactions occurs in the same second. However manually startup task from both LSN and TIMESTAMP cannot guarantee no duplicates nor gaps, unless there are other factors to meet your needs. This is completely different from resuming a task. The 'RESUME' task function does not apply to the behavior described below.

This is not Qlik Replicate defect or bad design, but it's RDBMS common behavior. Assume we have below transactions in source DB:

LSN	Transactions	Transaction Timestamp
100	(trxID=1) UPDATE TABLE ... A	01:00:00 PM
200	(trxID=2) INSERT INTO TABLE B ...	02:00:00 PM
300	(trxID=2) COMMIT	03:00:00 PM
400	Changes applied & Task STOP	04:00:00 PM
500	(trxID=1) COMMIT	05:00:00 PM

Assume the task stopped after TrxID=2 was applied on the target side. Now, if we restart the task:

If we start from the 4PM timestamp: The transaction is incomplete, and Replicate will discard it, causing TrxID=1 to be lost.
If we start from LSN 400: The same issue occurs. To avoid data loss, Qlik Replicate should start from LSN 100 or the 1PM timestamp ("begin Transaction" of trxID=1). However, in doing so, TrxID=2 will be processed twice, leading to duplicate records on the target.

In short, it’s difficult to ensure no duplicates or gaps by simply restarting from a specific stream position. To handle this:

If possible, start the task earlier than TrxID=2 to capture the complete transaction. For more information please check SOURCE_TAIL_POSITION and SOURCE_TAIL_TIMESTAMP. As Desmond said, Without losing any changes, you should use the SOURCE_TAIL_POSITION or SOURCE_TAIL_TIMESTAMP from the attrep_status table.
If the table has a Primary Key (PK) or Unique Index (UI), set the task to UPSERT mode to prevent duplicates. You may get warning: Source changes that would have had no impact were not applied to the target database. Refer to the 'attrep_apply_exceptions' table for details. This is expected behavior.
If the table has not Primary Key (PK) or Unique Index (UI), or you are using Store Changes table, you will get duplicates in target tables.

I hope this clarifies the process.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

TomaszRomanowski · ‎2024-10-07

Thank you for your clarification.

john_wang · ‎2024-10-07

Thank you for your support @TomaszRomanowski !

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Switching from Direct Replication Path to Logstream Without Reloading Data

CDC - 1-many - Log Stream