Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
jonna
Partner - Creator
Partner - Creator

CDC job against Oracle db fails intermittently, with no apparent pattern

Hi folks

 

From time to time we have a CDC task failing - the error is "Failed to get source column def for ids[6]=200760456. [1021803] " We presumed was related to DDL changes at either the source or at target but from our interrogation of  the log file entries and hunting on the 'net  we havent been able to find  any potential ideas as to what might be happening. 

The particulars are as follows

 

  • We have a CDC task extracting from an Oracle source endpoint
  • The task has no filtering or transformation - it is simply extract then load into a SQL Server database
  • When the task runs it  reads from a single  table, which has 92 columns. The columns are of regular string/date/numeric types and there are no LOBs. 
  • Full load processing is also activated for the task as well as CDC
  • The source is a well established database and hasn't changed in format since c2012

 

We have extracted the log into an Excel spreadsheet - I have highlight the entries closest to the error, they are near line  1183.

 

Please help if you can

Cheers, jonna

Labels (2)
1 Solution

Accepted Solutions
Dana_Baldwin
Support
Support

Hi @jonna 

What version of Qlik Replicate are you using? I found a software issue related to this message that there may be a patch for. Other than that, we use a separate connection to Oracle source for metadata purposes, and the session could have been idle long enough for it to expire.

There is a setting for the Oracle OCI APIs calls that will tell it to create the communication socket to the server with the TCP/IP keep-alive option on.  When the TCP/IP keep-alive option is on, the TCP/IP system will send keep-alive packets to the Oracle server from time to time. If a keep-alive packet will fail to send its packets a few times, it will then force the closure of the socket, and thus force the OCI API call to fail and give an error, and not just hang.

So, turning on the Keep-Alive option can be a solution for the issue. In addition, keep-alive time might also need to be changed (the default on Windows is 2 hours), so that when an Oracle OCI hang happens, Replicate will be released from the hang, and give an error relatively quick.

On the replicate machine, edit the local tnsname.ora file that is used by the Oracle client that is installed. The entry should include the setting (ENABLE=BROKEN):

ORCL_REPLICATE =

  (DESCRIPTION =

   (ENABLE=BROKEN)

     (ADDRESS_LIST =

      (ADDRESS = (PROTOCOL = TCP) (HOST = oracle_server.company.com)(PORT = 1521))

    )

    (CONNECT_DATA =

      (SERVICE_NAME = orcl)

    )

  )

Reduce Keep-Alive Values:  this is an optional step, but a recommended one. Since the TCP/IP keep-alive default values can be quite long (2 hours on Windows).

Please refer to OS documentation on how to reduce timeout values.

Thanks,

Dana

View solution in original post

5 Replies
Dana_Baldwin
Support
Support

Hi @jonna 

What version of Qlik Replicate are you using? I found a software issue related to this message that there may be a patch for. Other than that, we use a separate connection to Oracle source for metadata purposes, and the session could have been idle long enough for it to expire.

There is a setting for the Oracle OCI APIs calls that will tell it to create the communication socket to the server with the TCP/IP keep-alive option on.  When the TCP/IP keep-alive option is on, the TCP/IP system will send keep-alive packets to the Oracle server from time to time. If a keep-alive packet will fail to send its packets a few times, it will then force the closure of the socket, and thus force the OCI API call to fail and give an error, and not just hang.

So, turning on the Keep-Alive option can be a solution for the issue. In addition, keep-alive time might also need to be changed (the default on Windows is 2 hours), so that when an Oracle OCI hang happens, Replicate will be released from the hang, and give an error relatively quick.

On the replicate machine, edit the local tnsname.ora file that is used by the Oracle client that is installed. The entry should include the setting (ENABLE=BROKEN):

ORCL_REPLICATE =

  (DESCRIPTION =

   (ENABLE=BROKEN)

     (ADDRESS_LIST =

      (ADDRESS = (PROTOCOL = TCP) (HOST = oracle_server.company.com)(PORT = 1521))

    )

    (CONNECT_DATA =

      (SERVICE_NAME = orcl)

    )

  )

Reduce Keep-Alive Values:  this is an optional step, but a recommended one. Since the TCP/IP keep-alive default values can be quite long (2 hours on Windows).

Please refer to OS documentation on how to reduce timeout values.

Thanks,

Dana

jonna
Partner - Creator
Partner - Creator
Author

Hi @Dana_Baldwin , thanks for getting back to me so quickly, most appreciated  👍 

 

We are on Replicate version is 2021.5.0.1272, hoping a patch may help out.

 

For the other bits and pieces you talked about I have passed the information on to the server admin team to get some resource to assess the server changes.  I will keep this post updated as and when they make the changes.

 

Thanks

Jonna

 

Dana_Baldwin
Support
Support

Hi @jonna 

The fix I found is already included in the build you are currently running. Thanks for keeping us posted on the other suggested changes.

Dana

jonna
Partner - Creator
Partner - Creator
Author

Good morning

 

Quick update: we are still awaiting some intervention from the server admin team. Unfortunately nothing more to report at this time.

 

Cheers, jonna

Dana_Baldwin
Support
Support

Thanks for the update @jonna