ORACLE CDC ISSUE

msantiago · ‎2023-09-29

Hi Everyone,

We are new to Qlik, and we are using Qlik enterprise manager. Currently we are experiencing CDC issue with Oracle as a source and as comparison we also have a similar connection using Oracle as well. This is a logstream task only by the way.

We also have same systems running on the same machine but we don't experience latency with those.

Attaching reptask log.

Appreciate the inputs and help on this.

Heinvandenheuvel · ‎2023-09-29

@msantiago Welcome to the Replicate forum.

So what is the difference between the TSY SO and TSYSP tasks? They appear to be running on the same server. Are they connecting to the same database? If so, you should consider a single logstream task. If not, are the databases configured differently with respect to REDO and ARCHIVE, like ASM vs network file system?

Note: a discussion for this was previously started in the AEM forum, but this is of course a Replicate Task/Issuey managed by AEM. In the task log attached in the AEM forum I noticed and significant issues with the Oracle connection. So this may well be an infrastructure issue. If interested, see: https://community.qlik.com/t5/Qlik-Enterprise-Manager/ORACLE-CDC-ISSUE/td-p/2123395

The attached file is a .RAR which can be opened with a (free trial) WinRAR. my 7-Zip did not like it. it expands to 136 MB mostly dues to useless SORTER lines

What I see in the attached log is a really bad connection - or is that like an network tcp-idle-timeout? Anything similar in the successful tasks? Doing a GREP for reconnect|reestablish results in showing issues every hour or so.

-28T20:01:51 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-28T21:07:41 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-28T22:47:32 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-28T23:53:22 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T01:38:45 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T02:44:26 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T04:36:27 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T05:42:12 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T07:19:49 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T08:25:30 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T10:03:23 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T11:09:08 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T12:38:00 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T13:43:45 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T15:27:59 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database

When I go over the log with a script looking for 'PERFORMANCE' lines it reports:

Task <TSYSP_ORACLE_LOGSTREAM> from 09/28 18:29:28 thru 09/29 15:55:15 ( = 77147 seconds).
There were 78 reads reported (1378 seconds), with 73 at the maximum IO size of 50000 KB.
Grand total of 3.6 GB was read = 0.2 GB/hour (3.6 GB ARCH @ 205MB/Archive) 0 online, 18 archived Redologs starts
Overall 48498 KB per read at 0.0 MB/sec. 2.7 MB/sec using active read time. Avg 17676 mSec/read.
Longest read time = 24920 ms for 41511 kb for thread 1 at 09/29 14:05:51.
Most read in 300 second interval = 830 MB --> 2.8 MB/sec at 09/29 13:50:07.
2 Oracle RAC threads detected.
RAC Thread 1 -    2.9 GB,   2.7 MB/sec, 63 reads.12 logs ( 470759 .. 470770 )
RAC Thread 2 -    0.7 GB,   2.6 MB/sec, 15 reads.6 logs ( 338922 .. 338927 )

Typically one expects to see 50MB/sec or better for reads, not 3MB/sec = 10GB/Hour which is what some database produce as redo/hour.

john_wang · ‎2023-09-29

Hello @msantiago ,

Besides @Heinvandenheuvel comments:

1. The was task set to access archived redo log file only, see line #28:

2023-09-28T18:29:36 [SOURCE_CAPTURE ]I: Oracle CDC will access Archived Redo logs only

The latency is reasonable with this setting, because of the near-real time changes in ONLINE redo log files will be held on until the changes were archived then Replicate start to process these changes.

2. There are 8 times repeat error eg:

2023-09-29T08:57:38 [SOURCE_CAPTURE ]T: OCI error 'ORA-03113: end-of-file on communication channel

The errors happened during16 minutes to 5 hours. Certainly it may aggravate the latency.

There are some articles 'ORA-03113' about in community, however many different reasons may lead to the same error, please work with the Oracle DBA to fix it.

-- Qlik Replicate: Failed to read from ASM file using PLSQL with thread id '1' from block number

-- Replicate ORA-03113 Error

-- Qlik Replicate: Failed to execute statement 'select distinct(tzname)...from V$TIMEZONE_NAMES'

Good luck,

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2023-09-29

Good catch @john_wang : "[SOURCE_CAPTURE ]I: Oracle CDC will access Archived Redo logs only"

To clarify, this means that if there are changes, those are only seen when a Redo gets rolled into an Archive which appears to happen every 800 MB or so, but we don't know at what schedule.

With this setting Between Archive logs appearing one must expect a second of latency for every second until a new one appears - the age of the last Archive log. That's the choice made and the latency is the price to pay to run more efficiently (no constant reading of the Redo tail).

@msantiago you need to review the Archive Log creation using queries like :

select THREAD#, STATUS, DELETED, FIRST_TIME, NEXT_TIME
from v$archived_log
where FIRST_TIME > SYSDATE - 5

If there are gaps of hours or days, then you may want to work with the source DBA to setup a SWITCHLOG every hour or so, whether the DB needs it or not, just to limit your latency to an hour.

Also, with big multi-hour gaps there may well be 'helpers' in Oracle or Network land which try to clean up seemingly idle session which would explain the (forced) disconnects. Work with network and/or DBA to figure that out.

Or.... KISS - just do not enable the "Use archived redo logs only" mode. Why was that set anyway?

Hein.

msantiago · ‎2023-10-05

Hi @Heinvandenheuvel , @john_wang

Coordinated with our DBA and applied the recommended settings from the topics related from above thread. Still having the same issue on latency. Attaching the DP and logs hoping for your help on this.

Thanks,

Mike

Best Practices

Connectivity - Sources or Targets

ORACLE CDC ISSUE

Best Practices

Connectivity - Sources or Targets

Related Topics