Attunity Replicate is unable to Replicate data for... - Qlik Community

itdidatalake · ‎2022-12-07

Hi Team,

We have an existing Attunity Task ingesting data from a table in SQL server. we have few updates to the table performed and Replicate task did not pull any data to our s3 bucket. up on checking the logs we found the below errors on the task. so we stopped the task and restated it from the early morning that day by selecting the advanced run options. that did not help as well. at last we increase the cdc timeout option to two hours(7200 sec) and after 3 hour latency it pushed the records to our s3 bucket.
we were unable to figure out the root cause for this. could you please help us figure out what the issue is and recommend any other updates or settings to the task to avoid issues in the future.

Thanks,
Sumanth

Michael_Litz · ‎2022-12-07

Hi,

Please check the S3 endpoint in the change processing section and look at the file size setting and the time setting:

Please try lowering the time setting and see if that helps.

Thanks,
Michael

lyka · ‎2022-12-07

Hello,

Based on what you described, i assume that you encountered this type of error:

[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: HYT00 NativeError: 0 Message: [Microsoft][ODBC Driver 17 for SQL Server]Query timeout expired [1022502] (ar_odbc_stmt.c:3467)

The timeout error happens when Replicate issues a query on the source (sometimes the target) and then takes a while to get a response. If the query takes longer than the default timeout value set then you will get this error.

The following timeout parameters can be set on the SQL Server Endpoint > Advanced Settings:

cdcTimeout
>> helps with timeout issues when Replicate is reading the transaction log. Sets the timeout for CDC queries such as fn_dump or fn_dblog timeout.

executeTimeout
>> Sets the timeout when waiting for a query executed in the DB. execute operation like running truncate, running, insert from select etc.

Hope this helps!

Thanks

Lyka

Heinvandenheuvel · ‎2022-12-08

Thanks for sharing the log. .. maybe also share the task json, at least the source endpoint details?

This is an unhappy source, with tons of data to chew through. I'm saying this based on the "Throughput monitor" lines. Those _should_ appear every two minutes when the source is 'caught' up. That is when it is reading the online log and getting to the end with the " select top <logScanBatchSize> * from fn_dblog" or at the end of each "SELECT * FROM fn_dump_dblog" used for reading an archived log.

Now since you started 'back in time' the archive log reads are likely. Around 03:00 in the log those read about 40 million events each and appear about 10 minutes apart, easily confirmed by the 'Task is running' which should show every 10 minutes. It starts with 3 hours latency and catches up a little.

Around 06:00 those selects read 100 Million events each and take 30 minutes each and on took an hour. Those are big archives. I'm guessing 50 - 100 GB each. Correct? That's a lot of work. The task falls behind to 3+ hours. The event volume could be caused by a large batch job, but more likely is a result of 'rebuilds'. Are there enough CPU and IO resources on the source for all of that; Is the Replicate CDC connection being throttled down?

After 8:00 the task starts to catch up to about 15 minutes but it never shows the 2 minute pattern. I'm pretty sure the source endpoint is set to 'archive log only' and those archives are running ever 15 minutes? Correct?

Now at 2022-12-06T12:26:24 the task is stopped and en Email send.

Only after that time these SOURCE_CAPTURE errors start to pop. I suspect that the task rundown already closed the SQL server source connections, but somehow the SOURCE_CAPTURE thread decided it still needed to look up a fresh partition ID it found in an event. That again hints to a rebuild - not sure.

All this last part would be a hard bug, so please submit a support case with your detailed version number, task json, the complete log and if you can an usages/environment scenario.

The slowness itself is unlikely to be a bug. That may need some consulting work. It might even not be possible to resolve. IFF rebuilds are in play, then you may want to try to stop the Replicate task during that event and resume after. If you know for sure there are no business events during the rebuild window, then you may consider 'skipping' those logs and starting with timestamp just before the business activity is supposed to start.

Attached a cryptic TXT file which I distilled from your log with a silly Perl script. Feed it into Excel.

log_time; db time; logtime-in-excel-style-day.fractionalday; db-latency; source latency from PERFORMANCE log if present; target latency from PERFORMANCE log if present; elapsed time between monitor reports; events reported; rate based on events and elapsed logtime; db_rate based on events and elapsed db time.

Hth

Hein.

Attunity Replicate is unable to Replicate data for a transaction

Errors - Unexpected Behavior

Functionality