Skip to main content
Announcements
UPGRADE ADVISORY for Qlik Replicate 2024.5: Read More
cancel
Showing results for 
Search instead for 
Did you mean: 
msantiago
Contributor

ORACLE CDC ISSUE

Hi Everyone,

We are new to Qlik, and we are using Qlik enterprise manager. Currently we are experiencing CDC issue with Oracle as a source and as comparison we also have a similar connection using Oracle as well. This is a logstream task only by the way.

We also have same systems running on the same machine but we don't experience latency with those. 

Attaching reptask log.

 

msantiago_0-1695981117889.png

 

Appreciate the inputs and help on this.

Labels (2)
4 Replies
Heinvandenheuvel
Specialist III

@msantiago  Welcome to the Replicate forum.

So what is the difference between the TSY SO and TSYSP tasks? They appear to be running on the same server. Are they connecting to the same database? If so, you should consider a single logstream task. If not, are the databases configured differently with respect to REDO and ARCHIVE, like ASM vs network file system?

Note: a discussion for this was previously started in the AEM forum, but this is of course a Replicate Task/Issuey managed by AEM. In the task log attached in the AEM forum I noticed and significant issues with the Oracle connection. So this may well be an infrastructure issue. If interested, see: https://community.qlik.com/t5/Qlik-Enterprise-Manager/ORACLE-CDC-ISSUE/td-p/2123395

The attached file is a .RAR which can be opened with a (free trial) WinRAR. my 7-Zip did not like it. it expands to 136 MB mostly dues to useless SORTER lines 

What I see in the attached log is a really bad connection - or is that like an network tcp-idle-timeout? Anything similar in the successful tasks? Doing a GREP for reconnect|reestablish results in showing issues every hour or so.

-28T20:01:51 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-28T21:07:41 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-28T22:47:32 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-28T23:53:22 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T01:38:45 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T02:44:26 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T04:36:27 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T05:42:12 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T07:19:49 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T08:25:30 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T10:03:23 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T11:09:08 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T12:38:00 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database
-29T13:43:45 [SOURCE_CAPTURE  ]I:  The connection to the source database reestablished
-29T15:27:59 [SOURCE_CAPTURE  ]I:  Trying to reconnect to the source database

When I go over the log with a script looking for 'PERFORMANCE' lines it reports:

Task <TSYSP_ORACLE_LOGSTREAM> from 09/28 18:29:28 thru 09/29 15:55:15 ( = 77147 seconds).
There were 78 reads reported (1378 seconds), with 73 at the maximum IO size of 50000 KB.
Grand total of 3.6 GB was read = 0.2 GB/hour (3.6 GB ARCH @ 205MB/Archive) 0 online, 18 archived Redologs starts
Overall 48498 KB per read at 0.0 MB/sec. 2.7 MB/sec using active read time. Avg 17676 mSec/read.
Longest read time = 24920 ms for 41511 kb for thread 1 at 09/29 14:05:51.
Most read in 300 second interval = 830 MB --> 2.8 MB/sec at 09/29 13:50:07.
2 Oracle RAC threads detected.
RAC Thread 1 -    2.9 GB,   2.7 MB/sec, 63 reads.12 logs ( 470759 .. 470770 )
RAC Thread 2 -    0.7 GB,   2.6 MB/sec, 15 reads.6 logs ( 338922 .. 338927 )

 

Typically one expects to see 50MB/sec or better for reads, not 3MB/sec = 10GB/Hour which is what some database produce as redo/hour.

john_wang
Support

Hello @msantiago ,

Besides @Heinvandenheuvel comments:

1. The was task set to access archived redo log file only, see line #28:

     2023-09-28T18:29:36 [SOURCE_CAPTURE ]I: Oracle CDC will access Archived Redo logs only

     The latency is reasonable with this setting, because of the near-real time changes in ONLINE redo log files will be held on until the changes were archived then Replicate start to process these changes.

2. There are 8 times repeat error eg:

     2023-09-29T08:57:38 [SOURCE_CAPTURE ]T: OCI error 'ORA-03113: end-of-file on communication channel

     The errors happened during16 minutes to 5 hours. Certainly it may aggravate the latency.

      There are some articles 'ORA-03113' about in community, however many different reasons may lead to the same error, please work with the Oracle DBA to fix it.

      -- Qlik Replicate: Failed to read from ASM file using PLSQL with thread id '1' from block number

      -- Replicate ORA-03113 Error

      -- Qlik Replicate: Failed to execute statement 'select distinct(tzname)...from V$TIMEZONE_NAMES' 

Good luck,

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Heinvandenheuvel
Specialist III

Good catch @john_wang  : "[SOURCE_CAPTURE ]I: Oracle CDC will access Archived Redo logs only"

To clarify, this means that if there are changes, those are only seen when a Redo gets rolled into an Archive which appears to happen every 800 MB or so, but we don't know at what schedule.

With this setting Between Archive logs appearing one must expect a second of latency for every second until a new one appears - the age of the last Archive log. That's the choice made and the latency is the price to pay to run more efficiently (no constant reading of the Redo tail).

@msantiago  you need to review the Archive Log creation using queries like :

select THREAD#, STATUS, DELETED, FIRST_TIME, NEXT_TIME
from v$archived_log
where FIRST_TIME > SYSDATE - 5

If there are gaps of hours or days, then you may want to work with the source DBA to setup a SWITCHLOG every hour or so, whether the DB needs it or not, just to limit your latency to an hour.

Also, with big multi-hour gaps there may well be 'helpers' in Oracle or Network land which try to clean up seemingly idle session which would explain the (forced) disconnects. Work with network and/or DBA to figure that out.

Or.... KISS - just do not enable the "Use archived redo logs only" mode. Why was that set anyway?

Hein.

Hein.

 

 

msantiago
Contributor
Author

Hi @Heinvandenheuvel , @john_wang 

Coordinated with our DBA and applied the recommended settings from the topics related from above thread. Still having the same issue on latency. Attaching the DP and logs hoping for your help on this.

Thanks,

Mike