Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have been investigating high latency in my solution and it seems to be caused by a specific, high activity table (RESB in SAP db).
I noticed this message in the log and wondered if could point me to a more specific cause that I could investigate:
00018076: 2022-12-28T09:39:44 [TARGET_APPLY ]T: Finish bulk because stream timeout (bulk_apply.c:1012)
Thanks
Martin
A few other points to clarification:
The task that shows noted message has only the one high activity table (RESB) and is showing 3 hours latency but with zero incoming changes.
And even with zero incoming changes, the apply throughput graph is showing 6-10 records / second being applied.
....I have 4 other tasks all reading from the same source and writing to the same target but they all have latency < 10 seconds.
Hi @Martin11
This log entry doesn't indicate a problem, it just informs you that the batch of data it was processing was applied to the target based on the length of time rather than the amount of data. Batches can be applied based on either time or amount of data. This is controlled based on what settings are in Task Settings \ Change Processing \ Change Process Tuning. More information can be found here:
Change Processing Tuning #Change Processing Tuning ‒ Qlik Replicate
When you select "Incoming Changes" on the Monitor tab, are you seeing changes in memory or on disk waiting for target commit? If you select Apply Latency are you seeing source or target latency?
Here's some general information about latency:
Latency is the time interval in seconds
between the time a change was
committed in the source system and the
time it is applied and committed in the
target system.
Source Latency: The gap in seconds
between the original change in the
source endpoint and capturing it.
Target Latency: The gap in seconds between the
original change in the source endpoint
and applying it to the target endpoint.
Apply Latency: The gap in seconds
between capturing the change in the
source endpoint and applying it to the target
endpoint.
Note During data capture, the target latency will always be equal to the source latency (even
though no data has reached the target yet). This is simply because target latency is the sum of
source latency + apply latency, and can therefore never be less than the source latency.
Latency when applying large transactions:
For example, when the most recent latency value was 10 seconds and now a transaction
of one million rows gets committed at the source endpoint, Attunity Replicate starts to
apply that transaction to the selected target and it will take some time to write all the
changes to the target (for example 60 seconds). During the next 60 seconds, the latency
value gradually grows to 70 seconds for the last change in the transaction. Once the
transaction is committed, the latency drops back to the 'regular' latency (10 seconds in
this case).
Latency when no transactions are being applied:
When a time period passes with no changes applied to the target, the latency calculation
is based on the time difference between the current time and the timestamp of the last
change event read from the transaction log. This could happen if, for example, there is
high activity on tables which are not selected for replication in the current task.
Thanks,
Dana
Hi Dana,
With regards to Latency when no transactions are being applied:
I witnessed a situation where 1 of my 5 tasks had 3 hours of latency and the other 4 tasks all had less than 10 seconds of latency.
(All 5 tasks are reading from the same source and writing to the same target.)
If latency can be the difference between current time and last change event read from the transaction log (even if not for task tables), shouldn't all 5 tasks have a similar latency ?
Thanks,
Martin
Hi @Martin11
Were there no incoming changes or pending changes for the target for any of the 5 tasks?
Thanks,
Dana
HI Dana,
There were but none had latency associated with those changes of more than 10 seconds.
So if latency is also representative of changes for tables not selected for replication by the task, I'm struggling to understand how the latency is so wildly different among these 5 tasks that have common source/target endpoints.
Martin
Hi @Martin11
Latency is only based on activity in tables not selected in the task when there is no activity in the tables that do belong to the task.
Thanks,
Dana
Hi Dana,
Is there any way to configure so that if there is no activity for the tables selected in the task that latency shows zero ?
Thanks,
Martin
@Dana_Baldwin provided some good information on Latency. Please be sure to enable logging for PERFORMANCE set to TRACE (in the current version VERBOSE does not add additional info beyond TRACE) and let us know which latency we are dealing with. Are there 'waiting for source commit' pending TX's?
You reported "Finish bulk because stream timeout" in the opening writing and subject for the topic. It may be good to explain that a little to understand whether it is relevant at all and is so whether, a cause, or an effect.
First, what does it mean to "finish a bulk", well it is very much an intermediate stage. It means that the 'bulk_apply' engine inside the 'TARGET_APPLY' threat decided is gathered enough changes (from the SORTER threat) for them to be applied and thus to START sending them to the target. That decision is based on task-settings - Change Processing Tuning - Batch tuning - choices.
The easy one is when there are more changes than the <Force apply a batch when processing memory exceeds (MB):> allowed for and the resulting message in the reptask log will be "Bulk finished because memory usage has exceeded the limit". After that message the tasks will start sending that bulk to the target which involves filling a 'atterp_changes' table and perhaps first stuffing them in a CVS file to be transferred over the network and applies after which the SQL apply statements start. "the bus is full"
Timeout is trickier because there are two!
Parameter "Longer than (seconds): " (default 1). This roughly maps to a formula : "now" > BulkStartTime + ApplyTimeoutMin and results in --> "Finish bulk because stream timeout". This is where the 'bulk_apply' code has been provided a change and waits around some to see it can put more changes in the bulk. Low values improve 'near realtime' behavior. Higher values allow for larger bulks which reduces resource use on targets. This is a bus driver keeping the door open to see if there is someone coming in running.
Parameter "But less than (seconds):" (default 30). This roughly maps to a formula : "now" > BulkStartTime + ApplyTimeout AND NOT EmptyBulk and results in --> "Finish Bulk because Bulk timeout"). The sorter keeps providing changes. Perhaps a steady stream of unrelated events, perhaps lots of changes for a single table in a transaction. They are added to the pile while there is still room. But eventually they have to be released to the target. Again low values improve 'near realtime' behavior. Higher values allow for larger bulks which reduces resource use on targets and very high values will lead to bulk filling up memory. This is a bus driver closing the door in your face 'sorry folks I gotta go I have a schedule to try to maintain and those on the bus already have been more than patient.'
Hope this helps understanding whether the "Finish bulk because stream timeout" is just fine or not.
For further help please clarify whether those other tasks truly work on the same source and target (and why more than 1 task?) and perhaps provide a bit of reptask log, with at least 'performance trace' and hopefully 'target_apply trace, for at least a minute but better 10 or more minutes and ideally showing a good task versus a bad task, where the 'bad' task log is most important of course. Oh, and how about a bad task JSON?
Cheers,
Hein.