Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
when I am executing CDC task for a table which has 1024 partitions. CDC task is failing with error attrep_apply_exceptions.
Hello @pyspark ,
Welcome to Qlik Community forum and thanks for reaching out here!
It's hard to tell from the information. Would you please elaborate "1024 partitions" meaning? is it a source table (and let us know the database type), and what's the target endpoint database type/version, and also Replicate version you are using now.
If you can please enable the corresponding components logging level to Trace (or Verbose) and then check the task log file to understand further:
1- if it's source side issue then set SOURCE_UNLOAD/SOURCE_CAPTURE to Trace
2- if it's target side issue then set TARGET_LOAD/TARGET_APPLY to Trace
Regards,
John.
On source side table we have created partitions(for eg. table item_loc has 11 tasks each task contains 100 partitions) and the target is synapse stage where we are migrating data from source(oracle on-prem) to target. We are using qlik(November 2022 (2022.11.0.475)) version.
Hello @pyspark ,
Thanks for the feedback.
Qlik Replicate supports Parallel Load, this mode can be used to accelerate the replication of large tables by splitting the table into segments and loading the segments in parallel. Tables can be segmented by data ranges, by partitions, or by sub-partitions.
For your scenario, the source Oracle database is supported. I cannot find Synapse in the supported target endpoints list, however in my test task, seems it works for me, at least the Parallel Load is not disabled (if any one endpoint is not supported then the Parallel Load function is disabled in the GUI Console). Please have a try and let me know if it works for you.
Good luck,
John.
>> failing with error attrep_apply_exceptions.
That's not an error, it is a table.
Did you mean to write " failing with errors in attrep_apply_exceptions." ?
Well, What does the error and statement columns in that table show for you?
>> On source side table we have created partitions(for eg. table item_loc has 11 tasks each task contains 100 partitions)
That's a lot of tasks for a single table. May we assume that this is using a LOGSTREAM solution to avoid having all those tasks read the redo logs? Are all those tasks using parallel load on those partitions and started at a similar time? That would potentially represent a lot of peak work?
Who/Why was decided on setting up 10 tasks? What was wrong with a single task - where was more parallelism needed? Source reading, Replicate processing, Target data transfer over the network, Target apply? Do those tasks handle other work as well? Was testing and verification with 1 task, 2 tasks and 5 tasks involved? I think it is highly unlikely that 10 tasks for a single table is ever a good design. You should probably re-consider a single tasks solution.
Still, more than 1 task may well be needed to allow other work to be processing while for example gigabyte of load or change data is being transferred to the target, but surely you tested and measured what can be gained there, because using more tasks is guaranteed to be more overhead, more resource consmption and may or might not improve the final speed.
Hein.