Skip to main content
Announcements
UPGRADE ADVISORY for Qlik Replicate 2024.5: Read More
cancel
Showing results for 
Search instead for 
Did you mean: 
MoeE
Partner - Creator III

How do keys defined in Replicate prevent duplicates from reaching the target?

Hi,

I have run into some interesting behaviour while I was doing some testing. I have a task that goes from Oracle to SQL server with one table only. The source table and target tables don't have any primary key or unique indexes defined on them. However, in Replicate -> Table Settings -> Transform, I've defined a combination of three columns as keys.

MoeyE_0-1733445220992.png

I tested inserting some duplicates on the source database however these duplicates are not applied to the target database, even though the target has no primary key or unique index defined on it. How exactly does this work? How does Replicate identify duplicates? Does it read the incoming records then compare them against the target table to decide whether or not to apply them? Is this less performant than having the actual PK/Unique Index on the target table? Please help me understand how this works, it is appreciated as always. Thanks.

Regards,

Mohammed

Labels (2)
8 Replies
john_wang
Support

Hello Mohammed, @MoeE 

Good morning and thanks for reaching out!

In newer major versions, Qlik Replicate does not check for duplicates when applying changes to the target. Instead, it improves performance by first deleting the corresponding rows (based on the primary key) and then inserting the updated rows.

To analyze the details of this behavior, you can set the TARGET_APPLY parameter to Verbose and review the task log file.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Heinvandenheuvel
Specialist III

@MoeE - What version are you using.

@john_wang  - "In newer major versions"

Please indicate the transition point.

@john_wang    - "Replicate...  improves performance by first deleting the corresponding rows ... "

Hmmm, is it (now?) unconditionally doing this?

My understanding is that this is only used when the task setting for Error Handling - Apply Conflict - Duplicate key when applying INSERT - is set to - UPDATE the existing target record.

Hein.

 

XXX - Settings

Error Handling

john_wang
Support

Hello @Heinvandenheuvel , @MoeE ,

Yes, you are right Hein. In Qlik Replicate 2024.5/2024.11, it's unconditionally to do deleting first then inserting updated row back if the task set to UPSERT mode. 

In previous versions, Replicate will detect the confliction first, if PK violates then convert the operations.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Heinvandenheuvel
Specialist III

@john_wang "it's unconditionally to do deleting first then inserting updated row back if the task set "

We are miss-communicating. When I wrote unconditionally, I meant without customer control, out of the box.

But it seems it still  _is_ conditional on upsert mode, which is an error handling end-user choice and OFF by default.

>> In previous versions, Replicate will detect the confliction first, if PK violates then convert the operations."

I beg to differ. In previous (and current!?) it would submit a batch of rows to update,  a duplicate error would be found, and all of updates would fail. Replicate would then attempt an update for each individual row in the batch for that table to determine which row cause the batch failure and then Ignore, Log, Suspent or stop, unless "UPDATE the existing target record" is selected. This may take 'forever' for certain non-OLTP targets, like 2 seconds/row = 6 hours for 10,000 rows.

Replicate will never be convert the operation after the fact. Instead it prevents the condition by pre-deleting ALL target rows from the batch (not just the duplicates) first and then re-inserting all the rows. 

That may or might not be more effective depending on the target DB (for sure more effective for SnowFlake) and furthermore it may or might not fit the customers data integrity whishes. As such, then may never ever become the default IMnsHO.

Cheers,

Hein.

 

MoeE
Partner - Creator III
Author

Hi @Heinvandenheuvel , @john_wang ,

Apologies for the late reply. I am on QR may 2023. Build 322. Yes, the following error handling settings are indeed set:

MoeyE_0-1733783188464.png

Furthermore, batch apply mode is set for CDC.

I was doing a bit of research and found this QR article on MERGE and UPSERT which states that when batch apply mode is enabled,

"Either option (Upsert/Merge) does an unconditional Delete of all rows in the batch, followed by an Insert of all rows.

Note: The other thing to note is that with this setting the actual update that fails is inserted in a way that may not be obvious and could cause issue with downstream processing. In batch apply mode the task will actually issue a pair of transactions (1st a delete of the record and then 2nd an insert) this pair of transactions is unconditional and will result in a "newly inserted row every time the record is updated on the source."

So my understanding is that when these settings are enabled, Replicate will now always apply changes by doing a delete then insert, even if the change is an update.

However for transactional apply, "the original statement is run and if it errors out then the switch is done (try and catch). ".

Thanks for the help guys.

Regards,

Mohammed

MoeE
Partner - Creator III
Author

Hi @Heinvandenheuvel , @john_wang,

I did a bit of testing on how each works, with target_apply logging increased. This was done with a unique index enabled on the target table but not the source. Batch apply mode is enabled, Upsert and Merge is enabled. Furthermore AR_H_OPERATION was added to the table. These were the results:

  • Update a record that doesn’t exist in the target
    • DELETE then INSERT occurs in the logs
    • Shown as UPDATE in the AR_H_OPERATION column and console
  • Insert a duplicate record
    • DELETE then INSERT occurs in the logs
    • Shown as INSERT in the console and in the AR_H_OPERATION column

Next I tested after removing only the key from the table in QR then doing a reload. Here are the results.

  • Update a record that doesn't exist in the target
    • Shown as an UPDATE in the QR console
    • Looks like an INSERT occurs then an update
  • Insert a duplicate record
    • Duplicate record is inserted into table
    • Logs show INSERTS only

Finally I tested with the key only being defined in QR but not the target table or the source:

  • Update a record that doesn't exist in the target
    • Nothing happens in the target
  • Insert a duplicate record
    • Duplicate created on the source but not the target

Regards,

Mohammed

john_wang
Support

Hello @MoeE ,

Yes, this is how Qlik Replicate works. Thanks for the detailed information.

>> Upsert and Merge is enabled

I'm a bit confused: first, for Oracle source SQL server target task, the Merge option is not available; secondly, it's not supported if you enable both of them. You can only enable one of them even for Cloud target endpoints.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
DesmondWOO
Support

Hi @MoeE ,

Please find this link: Apply Changes using SQL Merge for more information. As John mentioned, once you enable 'Apply changes using SQL MERGE,' if you return to the 'Apply Conflicts Handling Policy' page, all options will be grayed out.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!