How do keys defined in Replicate prevent duplicate... - Qlik Community

MoeE · ‎2024-12-05

Hi,

I have run into some interesting behaviour while I was doing some testing. I have a task that goes from Oracle to SQL server with one table only. The source table and target tables don't have any primary key or unique indexes defined on them. However, in Replicate -> Table Settings -> Transform, I've defined a combination of three columns as keys.

I tested inserting some duplicates on the source database however these duplicates are not applied to the target database, even though the target has no primary key or unique index defined on it. How exactly does this work? How does Replicate identify duplicates? Does it read the incoming records then compare them against the target table to decide whether or not to apply them? Is this less performant than having the actual PK/Unique Index on the target table? Please help me understand how this works, it is appreciated as always. Thanks.

Regards,

Mohammed

john_wang · ‎2024-12-05

Hello Mohammed, @MoeE

Good morning and thanks for reaching out!

In newer major versions, Qlik Replicate does not check for duplicates when applying changes to the target. Instead, it improves performance by first deleting the corresponding rows (based on the primary key) and then inserting the updated rows.

To analyze the details of this behavior, you can set the TARGET_APPLY parameter to Verbose and review the task log file.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2024-12-06

@MoeE - What version are you using.

@john_wang - "In newer major versions"

Please indicate the transition point.

@john_wang - "Replicate... improves performance by first deleting the corresponding rows ... "

Hmmm, is it (now?) unconditionally doing this?

My understanding is that this is only used when the task setting for Error Handling - Apply Conflict - Duplicate key when applying INSERT - is set to - UPDATE the existing target record.

Hein.

XXX - Settings

Error Handling

john_wang · ‎2024-12-08

Hello @Heinvandenheuvel , @MoeE ,

Yes, you are right Hein. In Qlik Replicate 2024.5/2024.11, it's unconditionally to do deleting first then inserting updated row back if the task set to UPSERT mode.

In previous versions, Replicate will detect the confliction first, if PK violates then convert the operations.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2024-12-08

@john_wang "it's unconditionally to do deleting first then inserting updated row back if the task set "

We are miss-communicating. When I wrote unconditionally, I meant without customer control, out of the box.

But it seems it still _is_ conditional on upsert mode, which is an error handling end-user choice and OFF by default.

>> In previous versions, Replicate will detect the confliction first, if PK violates then convert the operations."

I beg to differ. In previous (and current!?) it would submit a batch of rows to update, a duplicate error would be found, and all of updates would fail. Replicate would then attempt an update for each individual row in the batch for that table to determine which row cause the batch failure and then Ignore, Log, Suspent or stop, unless "UPDATE the existing target record" is selected. This may take 'forever' for certain non-OLTP targets, like 2 seconds/row = 6 hours for 10,000 rows.

Replicate will never be convert the operation after the fact. Instead it prevents the condition by pre-deleting ALL target rows from the batch (not just the duplicates) first and then re-inserting all the rows.

That may or might not be more effective depending on the target DB (for sure more effective for SnowFlake) and furthermore it may or might not fit the customers data integrity whishes. As such, then may never ever become the default IMnsHO.

Cheers,

Hein.

MoeE · ‎2024-12-09

Hi @Heinvandenheuvel , @john_wang ,

Apologies for the late reply. I am on QR may 2023. Build 322. Yes, the following error handling settings are indeed set:

Furthermore, batch apply mode is set for CDC.

I was doing a bit of research and found this QR article on MERGE and UPSERT which states that when batch apply mode is enabled,

"Either option (Upsert/Merge) does an unconditional Delete of all rows in the batch, followed by an Insert of all rows.

Note: The other thing to note is that with this setting the actual update that fails is inserted in a way that may not be obvious and could cause issue with downstream processing. In batch apply mode the task will actually issue a pair of transactions (1st a delete of the record and then 2nd an insert) this pair of transactions is unconditional and will result in a "newly inserted row every time the record is updated on the source."

So my understanding is that when these settings are enabled, Replicate will now always apply changes by doing a delete then insert, even if the change is an update.

However for transactional apply, "the original statement is run and if it errors out then the switch is done (try and catch). ".

Thanks for the help guys.

Regards,

Mohammed

MoeE · ‎2024-12-09

Hi @Heinvandenheuvel , @john_wang,

I did a bit of testing on how each works, with target_apply logging increased. This was done with a unique index enabled on the target table but not the source. Batch apply mode is enabled, Upsert and Merge is enabled. Furthermore AR_H_OPERATION was added to the table. These were the results:

Update a record that doesn’t exist in the target
- DELETE then INSERT occurs in the logs
- Shown as UPDATE in the AR_H_OPERATION column and console
Insert a duplicate record
- DELETE then INSERT occurs in the logs
- Shown as INSERT in the console and in the AR_H_OPERATION column

Next I tested after removing only the key from the table in QR then doing a reload. Here are the results.

Update a record that doesn't exist in the target
- Shown as an UPDATE in the QR console
- Looks like an INSERT occurs then an update
Insert a duplicate record
- Duplicate record is inserted into table
- Logs show INSERTS only

Finally I tested with the key only being defined in QR but not the target table or the source:

Update a record that doesn't exist in the target
- Nothing happens in the target
Insert a duplicate record
- Duplicate created on the source but not the target

Regards,

Mohammed

john_wang · ‎2024-12-10

Hello @MoeE ,

Yes, this is how Qlik Replicate works. Thanks for the detailed information.

>> Upsert and Merge is enabled

I'm a bit confused: first, for Oracle source SQL server target task, the Merge option is not available; secondly, it's not supported if you enable both of them. You can only enable one of them even for Cloud target endpoints.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

DesmondWOO · ‎2024-12-10

Hi @MoeE ,

Please find this link: Apply Changes using SQL Merge for more information. As John mentioned, once you enable 'Apply changes using SQL MERGE,' if you return to the 'Apply Conflicts Handling Policy' page, all options will be grayed out.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

How do keys defined in Replicate prevent duplicates from reaching the target?

Errors - Unexpected Behavior

General Question