Solved: Latency Keeps Building Up - Oracle Connection - Qlik Community

nabeelaslam1994 · ‎2023-05-24

Hello Team,

I am running a CDC task against an oracle database. The latency keeps building up and even after restarting the task, It starts building up again. Please guide on this. Last latency went upto 24 hours

Thanks,

Nabeel

Heinvandenheuvel · ‎2023-05-25

>> Thanks @Steve_Nguyen . I did that and It is adding new data to the logs. What are the next steps possibly?

Well, you could - no you _should_ at least TRY to interpret that new data. Share out what you suspect are the relevant lines/chunks - include them as attached ".txt" files, or smaller parts in a text/code box in a reply.

@Steve_Nguyen >> look like the latency is on the source. there is a transaction that is not committed on the source

I beg to differ, but will happily be proven to be wrong. It may well be source latency, but not for the reason you indicate. An open transaction is just that. It does not triggers latency. That would not be 'fair'. You cannot 'blame' replicate for end-users not finishing a transaction. To be sure I just tried. Run task with CDC, "There are no changes indicated; SET TRANSACTION READ WRITE; INSERT RANDOM-TABLE VALUES (VALUE-LIST-FOR-THAT-RANDOM-TABLE); and wait. Replicate now shows 1 transaction waiting for source commit and the latency counts up to my retry interval and drops back to zero over and over. When I commit or rollback the waiting for count decreases but no latency impact.

@nabeelaslam1994 you need to first determine whether this is source or target latency (or both?!) and next determine who is waiting for what. Your first indication is in the logs with the PERFORMANCE lines. It tells you right there - source - target - processing. Remember, the target latency is always the bigger one, starting with the source latency and adding the processing. Alternatively, look at the latency graph - now click on the target latency to disable it's display - does the line switch from blue to orange? Ok, then the source latency was 'behind' the target latency in the picture.

Once you know where to look - start looking! for example for source latency pay close attention to the REDO/ARCH read details in the PERFORMANCE lines in the trace. If it si target latency, look at the last SQL requested from the target and judge whether it is an issue - or ask teh target (Snowflake) what queries it is working on for the Replicate user / process.

Good luck,

Hein

View solution in original post

nabeelaslam1994 · ‎2023-05-24

Target Connection is Snowflake.

Steve_Nguyen · ‎2023-05-24

@nabeelaslam1994

For latency and performance issues:

On the task's Monitoring tab, Tools drop-down menu, select Log Management.
On the screen that opens scroll down to the following items and set them over one position to the right, Trace.
The change will take effect immediately (no need to stop/resume the task).
The task log will be fairly large so limit this to only 20-30 minutes before returning these logging levels back to Info,

then review the log to see if latency is source or target.
PERFORMANCE ,, trace
SOURCE_CAPTURE ,, trace
TARGET_APPLY ,, trace
SORTER ,, trace
**make sure to not to enable : Store trace/verbose logging in memory, but if an error occurs write to the logs**

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

nabeelaslam1994 · ‎2023-05-24

Thanks @Steve_Nguyen . I did that and It is adding new data to the logs. What are the next steps possibly?

Nabeel

nabeelaslam1994 · ‎2023-05-24

I see this as well.

Steve_Nguyen · ‎2023-05-24

@nabeelaslam1994 look like the latency is on the source.

there is a transaction that is not committed on the source

look at the sourc_capture, to see what information is looking for , could provide more information.

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2023-05-25

>> Thanks @Steve_Nguyen . I did that and It is adding new data to the logs. What are the next steps possibly?

Well, you could - no you _should_ at least TRY to interpret that new data. Share out what you suspect are the relevant lines/chunks - include them as attached ".txt" files, or smaller parts in a text/code box in a reply.

@Steve_Nguyen >> look like the latency is on the source. there is a transaction that is not committed on the source

I beg to differ, but will happily be proven to be wrong. It may well be source latency, but not for the reason you indicate. An open transaction is just that. It does not triggers latency. That would not be 'fair'. You cannot 'blame' replicate for end-users not finishing a transaction. To be sure I just tried. Run task with CDC, "There are no changes indicated; SET TRANSACTION READ WRITE; INSERT RANDOM-TABLE VALUES (VALUE-LIST-FOR-THAT-RANDOM-TABLE); and wait. Replicate now shows 1 transaction waiting for source commit and the latency counts up to my retry interval and drops back to zero over and over. When I commit or rollback the waiting for count decreases but no latency impact.

@nabeelaslam1994 you need to first determine whether this is source or target latency (or both?!) and next determine who is waiting for what. Your first indication is in the logs with the PERFORMANCE lines. It tells you right there - source - target - processing. Remember, the target latency is always the bigger one, starting with the source latency and adding the processing. Alternatively, look at the latency graph - now click on the target latency to disable it's display - does the line switch from blue to orange? Ok, then the source latency was 'behind' the target latency in the picture.

Once you know where to look - start looking! for example for source latency pay close attention to the REDO/ARCH read details in the PERFORMANCE lines in the trace. If it si target latency, look at the last SQL requested from the target and judge whether it is an issue - or ask teh target (Snowflake) what queries it is working on for the Replicate user / process.

Good luck,

Hein

nabeelaslam1994 · ‎2023-06-01

hi @Heinvandenheuvel @Steve_Nguyen and Team,

I set the performance to TRACE and see this now. The latency went up to 2 hr 45 mins .

00004680: 2023-06-01T09:58:00:672520 [PERFORMANCE ]T: Source latency 9631.67 seconds, Target latency 9631.67 seconds, Handling latency 0.00 seconds (replicationtask.c:3734)

Also I keep seeing this in the logs:

00001936: 2023-06-01T09:57:31:694831 [PERFORMANCE ]T: Start read from archived Redo log 51200000 bytes at offset 00000000030d4200 for requested offset 00000000030d4400, thread '1' (oradcdc_redo.c:1025)

00001936: 2023-06-01T09:57:42:841387 [PERFORMANCE ]T: Completed to read from archived Redo log 51200000 bytes at offset 00000000030d4200 with rc 1, read time is 11146 ms, thread '1' (oradcdc_redo.c:1042)

00001936: 2023-06-01T09:57:42:841387 [PERFORMANCE ]T: Start read from archived Redo log 51200000 bytes at offset 00000000c34f8400 for requested offset 00000000c34f8600, thread '2' (oradcdc_redo.c:1025)

00001936: 2023-06-01T09:57:45:691785 [PERFORMANCE ]T: Completed to read from archived Redo log 51200000 bytes at offset 00000000c34f8400 with rc 1, read time is 2850 ms, thread '2' (oradcdc_redo.c:1042)

00001936: 2023-06-01T09:57:45:744298 [PERFORMANCE ]T: Start read from archived Redo log 51200000 bytes at offset 00000000c65cc200 for requested offset 00000000c65cc400, thread '2' (oradcdc_redo.c:1025)

00001936: 2023-06-01T09:57:49:31289 [PERFORMANCE ]T: Completed to read from archived Redo log 51200000 bytes at offset 00000000c65cc200 with rc 1, read time is 3286 ms, thread '2' (oradcdc_redo.c:1042)

000

Do you know what all these means?

Nabeel

Heinvandenheuvel · ‎2023-06-01

>> Source latency 9631.67 seconds, Target latency 9631.67 seconds, Handling latency 0.00 seconds

Currently the root cause for the Replicate latency is getting changes from the source - 9632 seconds = almost 3 hours after they occurred. The target latency is the effect of source + handling.

BTW... if you click on that blue (V) Target legend on the first picture you included you'll remove the Blue target line and you'll find it was completely hiding the orange (V) Source latency line - the root.

>> Completed to read from archived Redo log 51200000 bytes at offset 00000000030d4200 with rc 1, read time is 11146 ms,

It reports that it took 11 seconds to read 50 MB of changes, equating to less than 5 MB/sec. The other provided examples took around 3 seconds for 15MB/sec. An OK speed could be 50MB/sec and a good speed is 150MB/sec. The 5 - 15 MB range is slow, but most importantly you need to figure out whether it is fast enough to 'keep up' . You can do this by issuing queries against V$ARCHIVED_LOG and group by DAY or by HOUR. Now divide by 86400 resp 3600 and convert to MB to see how much speed is minimally needed just to read everything without falling behind long term. Add 50% to the required speed to cope with peaks without too much delay. Sample query:

select  to_char(FIRST_TIME, 'YYYY-MM-DD')  "DATE",
TO_CHAR(SUM((blocks*block_size)/(1024*1024*1024)), '999,999,999,999') "GIGABYTES",
TO_CHAR(SUM((blocks*block_size)/(1024*1024*86400)), '999,999,999,999') "MB/Sec"
from v$archived_log where FIRST_TIME > sysdate - 10
group by to_char(FIRST_TIME, 'YYYY-MM-DD')
ORDER BY to_char(FIRST_TIME, 'YYYY-MM-DD') DESC

Now let's look at the time-stamps only to verify -

09:57:31:694831 [PERFORMANCE ]T: Start read
09:57:42:841387 [PERFORMANCE ]T: Completed 
09:57:42:841387 [PERFORMANCE ]T: Start read
09:57:45:691785 [PERFORMANCE ]T: Completed 
09:57:45:744298 [PERFORMANCE ]T: Start read
09:57:49:312890 [PERFORMANCE ]T: Completed

It's easy to confirm those 11 seconds, the almost 3 and the just over 3 read time. Agreed? And you also see that Replicate turned 'right around' maybe 50 mSec later to request the next chunk. Therefor the delay does NOT appear to be in Replicate processing what it read.

So now you have to work with the DBA - Find how much REDO / hour or /day is being produced. Learn about the Archive storage device performance, learn about what he/she sees in the Replicate reader threads - they are not 'niced' down are they? Learn about the network throughput characteristics. You may even have to work with a network person to find out whether the network cannot deliver fast enough or whether the network is not handed the data quickly enough by the Oracle thread(s) honoring Replicate requests.

Good luck!

Hein.

Dana_Baldwin · ‎2023-06-01

Hi @nabeelaslam1994

Your latency is due to intermittently slow reads from the source. Please note, target latency is always the combination of source latency and target latency. Since they are equal, all the latency is on the source side.

At worst the task is able to read only 15.58 MB in half a second. Good read speed would be 50 MB in half a second.

51.2 / 3.286 = 15.58

Please check with your source DBA on why the read speed is slow. Network performance can also have an impact.

Hope this helps!

If one of the replies helped to resolve the issue, please mark it as a solution & like it to make it easier for others to find.
Thanks in advance!

Dana

Latency Keeps Building Up - Oracle Connection

Best Practices

CDC - 1-many - Log Stream

Configuration

Errors - Unexpected Behavior

General Question