Re: Failed to start bulk apply transaction - Qlik Community

guilherme-matte · ‎2023-08-01

Hello Team!

I'm having issues with one task in Replicate when using and Oracle Endpoint to Azure SQL DB

The task runs fine, but at 4AM when an internal job runs on the server, I start having the following issue:

Failed to start bulk apply transaction
Failed to execute delete statement.
~{FAQAAKW0khrf8AhRPieIDR0GA+Tk9lH+Ly8/YEfx1/JuCV1z9MCD2otGj7Rk3Eu93RbqYADGNWREHoPb8PjStY2rd0uOuLmSv8YmEIwXaqcD+i7an77DocHGp+mGQWfcOO5//rbV8kO6u/56Ooz2nAfO5UzLxv7Fa6UHidjaXzHXzFJhfCVs8KiKMqcVwLQOfBVVk66OcnIxjJwFekFUmWDyMOyDW18BYRx667+c7asLullIsL8w37TAPOxr8NZ00GZCUufDs8O1k3H+M74HWbosxj2QCY2m7n84jdeR6Hkb//vfqpvKU8PiYnJAmNqIPXoMNcZ86RqRuD4Z6p+qq7T1Zz650tEstqEqYHgSH0L7TOghWxqjNDBqe10d1GSzfawqhMrzxNsvDGv6+Y2HoEK8s9cWsbod7R4J4o/gqMW+p+zFYPJbPncXv2xG7J3SGeW7oirml/cDiVEOyI4Rtxu/6WR3m7nuUddL2DMVZLQ0FbEBjXp84xlOFII4TP4vUKyQhw77hLYmm6M365vM6eVmBlnffHJetEohtb0MPPYCW68NCqy6qS2LPGgkFZnd8+Ga+B5HXm4/VemIj74QhRlmQ9Zmzc4bj9ik1BFR1JnzW550}~
RetCode: SQL_ERROR SqlState: HYT00 NativeError: 0 Message: [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired
Failed (retcode -1) to execute statement: 'DELETE FROM [FINPROD].[BIF3_ITEM_CODE_DETAILS] FROM [attrep_changesC50A62A66CFF4128] WHERE ([FINPROD].[BIF3_ITEM_CODE_DETAILS].[ITEM_CODE]= CAST ( [attrep_changesC50A62A66CFF4128].[seg1] as varchar(80)) COLLATE SQL_Latin1_General_CP1_CI_AS OR ([FINPROD].[BIF3_ITEM_CODE_DETAILS].[ITEM_CODE] IS NULL AND [attrep_changesC50A62A66CFF4128].[seg1] IS NULL)) AND [attrep_changesC50A62A66CFF4128].[seq] >= ? AND [attrep_changesC50A62A66CFF4128].[seq] <= ?'

From here, the task starts writing to the target super slowly, barely noticeable. Which ends up needing a manual stop since the CDC starts pilling up changes. The log repeats the error and warnings until the task stops.

The tables here only have Unique Index, so no PK are generated on the target (only Unique Indexes)

Any ideas on what could be the source of this issue?

Kind Regards!

Michael_Litz · ‎2023-08-01

Hi @guilherme-matte ,

It looks like the delete statement is taking much longer than expected to complete. You can try double the internal parameter executeTimeout if doubling does not help the 4 times. On the advanced tab at bottom paste the parameter name in and double the value.

Also please look into what the internal job is doing and if it runs same time every day you could try scheduling the task to stop before the internal job runs and schedule it to start after the job is complete.

Thanks,
Michael

Dana_Baldwin · ‎2023-08-01

Hi @guilherme-matte

In addition to what @Michael_Litz suggested, what type of job runs at 4AM? Can it be optimized to reduce impact on the target database?

Thanks,

Dana

guilherme-matte · ‎2023-08-01

Hey @Michael_Litz , @Dana_Baldwin thank you both for the replies!

Just to avoid any misunderstandings, the job runs at 4AM in the oracle source, not replicate server.

Noob question here, would stopping and resuming the when the job runs change the outcome? Wouldn't the Replicate follow the same steps in order to synchronize the data across both databases?

I already isolated that table and will create another task in order to try this parameters tweak. @Dana_Baldwin unfortunately the client is not willing to change the source in this case, will need to work around it.

Will keep you guys posted.

guilherme-matte · ‎2023-08-01

Hey @Michael_Litz, I've checked this parameter and it is by default set to 0 already, which I imagine means that there is no timeout limit on the query, correct?

john_wang · ‎2023-08-02

Hello @guilherme-matte ,

The internal parameter executeTimeout default value should be 60 seconds. For example if you try to add it to endpoint (if it does not exist yet):

We can verify the current setting of it by several ways:

1- export the task and check the JSON file

2- check it in GUI

3- check the task log file to see what's the time difference between the previous line and the error message line (it's better with Verbose logging level)

BTW, in general we do not suggest setting it to zero because if there is an exception Replicate takes too long to get the real error message. let's set it to a reasonable value, eg 5 minutes, half hour etc.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

guilherme-matte · ‎2023-08-02

Hello @john_wang !

Thank you for the feedback,

Thats quite weird though, you can see by my screenshot that the default is 0.

If i change it to another value and click the "reset to default value" icon it goes back to 0

Just wanted to point out, not that I think it could be related to the slow Delete issue from the topic.

Edit: in this case im looking at the Source endpoint connector, which is oracle

Cheers!

john_wang · ‎2023-08-02

Hello @guilherme-matte ,

Thanks for the update. my apology I should make it clearer in my previous comment:

The internal parameter executeTimeout default value should be 60 seconds. For example if you try to add it to target endpoint (if it does not exist yet):

In your scenario, for Oracle source, the executeTimeout should always be set to 0. The best practice is, do not set it in SOURCE endpoint, Replicate will take the default value for the specific endpoint.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2023-08-02

The issue is caused by using a UI instead of PK on target.

LOOK! Look at your target SQL to execute. Simplified:

DELETE FROM ... 
WHERE ITEM_CODE = seg1 
OR ( ITEM_CODE IS NULL AND seg1 IS NULL )
AND seq >= ? AND seq <= ?

The SQL execute engine has the right to look for ITEM_CODE IS NULL first. That would be a full table scan because there cannot be an an index entry with value NULL. If there are many rows in the change table (that job on the source?) then it may opt for the full table scan on the base table which surely would take 'forever'. Did it?

Switch to LOGGING TARGET_APPLY TRACE. Now look for the reported query performance throughout the day and after the 4am source job. Use a script to tally the number of rows to be affected (SEQ FROM x to Y), the DURATION (minus timestamp for Query and plus timestamp for next TARGET_APPLY on same stream and log time.

[I created a Perl script 'apply_summary' just for this reason many years ago. It served me well]

How long does it typically take? Find your MIN, MAX, AVG - time/quantity for outliers

Now go play in isolation in a DEV environment. Try 1 change vs 1000 changes vs 100,000 changes. Try to recreate target as PK table, and so on. Monitor the target resources while doing so - CPU, pagelookups, execution plan?

Good luck,

Hein

Dana_Baldwin · ‎2023-08-02

Hi @guilherme-matte

With regard to Michael's suggestion to stop the task during the time of the maintenance job - that would reduce the overhead on source/target for the duration of the job. When the task resumes after it is complete, it will start processing changes from where it left off (provided of course your source archivelogs are available from the time the task stopped to current).

Thanks,

Dana

Failed to start bulk apply transaction

Configuration

Performance