Solved: Oracle DB to Azure Parquet Files Query - Qlik Community

RamdasP · ‎2022-10-21

Hi

We have a replicate task that reads from Oracle Archive Logs & writes to Parquet files on Azure. The Parquet files are then processed by Qlik Compose and written to the ODS layer. We reconciled our Azure ODS layer with the source data recently and found that the Azure ODS layer had a few records that was not in the source tables nor was it deleted from the source tables.

Here is what I think could have happened.

A bulk insert operation was done in the source table.
This bulk insert was logged in the archive log 1 which got full and was shipped to the Qlik replicate server.
A rollback happened in the source table. So the data does not exist in the source table anymore.
The rollback was logged in archive log 2

My question is

While archive logs are being processed by Qlik Replicate into Parquet files does it wait until it hits a rollback or commit before it writes to the Parquet files.

Kind Regards

Ramdas

Heinvandenheuvel · ‎2022-10-21

I'm afraid your suggested explanation is not reasonable.

Replicate always waits for source transaction commits before sending rows onwards. The GUI makes that pretty clear.

It is not relevant whether the commit is detected in the same or a subsequent (archived or not) transaction long.

On the target side the source transaction may be merged or split up. If anything goes wrong there, the target table should become suspended awaiting reload.

You may want to scan the reptask_xxx.log file for lines with ]E: or ]W: to detect potential issues which had not been otherwise observed. Regular expression: /\][EW]:\s/

Hein.

View solution in original post

Heinvandenheuvel · ‎2022-10-21

I'm afraid your suggested explanation is not reasonable.

Replicate always waits for source transaction commits before sending rows onwards. The GUI makes that pretty clear.

It is not relevant whether the commit is detected in the same or a subsequent (archived or not) transaction long.

On the target side the source transaction may be merged or split up. If anything goes wrong there, the target table should become suspended awaiting reload.

You may want to scan the reptask_xxx.log file for lines with ]E: or ]W: to detect potential issues which had not been otherwise observed. Regular expression: /\][EW]:\s/

Hein.

RamdasP · ‎2022-10-21

Hi Hein,

Thx for your quick response.

Can you think of any potential reason as to how records are in the Azure ODS layer but not in the source tables. We are sure that no records were deleted from the source tables.

Kind Regards

Ramdas

Steve_Nguyen · ‎2022-10-21

try to review the log as Hein advise:

You may want to scan the reptask_xxx.log file for lines with ]E: or ]W: to detect potential issues which had not been otherwise observed. Regular expression: /\][EW]:\s/

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!

RamdasP · ‎2022-10-26

Hi Hein,

I asked our admin to review the reptask_xxx.log file for lines with ]E: or ]W & these are the warnings he found.

LOB lookup: row was already deleted for the table CISADM.CI_SP, PK values: (oracle_endpoint_unload.c:1150)

Column 'SP_ID': '~{AQAAANuk8STUXwr26mrSyEV9l68=}~' (oracle_endpoint_unload.c:933)

Do you have any other suggestion for how records are in the Azure ODS layer but not in the source tables?

Any other suggestions will be highly appreciated.

Kind Regards

Ramdas

Heinvandenheuvel · ‎2022-10-27

No concrete suggestions for now. You may need to request some consulting to review configuration and such as this goes likely beyond a support call and is unlikely (but not impossibly) related to a product defect.

As for the "LOB lookup: row was already deleted" - that's a timing issue as a consequence of the Replicate CDC design. Replicate does NOT initially ingest the possibly large LOB column as they may well be optimized away with filter settings, column drops, operation aggregation (insert + update => single insert). The current design first waits to see if the LOB column ended up as a candidate for a row to be applied. If so, the target_apply thread requests the source modules to go fetch the lob ('Just in Time' delivery?). This will typically be seconds after the source change was committed. But it could be hours depending on design choices and performance options. A subsequent transaction may have deleted the row. The lob fetch failing is considered just a warning because supposedly in a short while, then the change processing catches up, the whole row will be gone anyway.

Hein.

Oracle DB to Azure Parquet Files Query

Functionality