Solved: Re: Error reading _CT parquet files which are crea... - Qlik Community

eksmirnova · ‎2024-03-20

Hello,

We are replicating data from DB2 to ALDS Gen2 container. Our set up is DB2 -> LogStream -> ADLS

Full load +Store changes with Parquet file format.

We are able to read Full load files without any issues. But getting an error while reading _CT files:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11232.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11232.0 (TID 76655) (10.241.17.215 executor 19): java.io.IOException: Could not read or convert schema for file: abfss://PATH/TABLE_NAME_ct/20240320-145022523.snappy.parquet

The column header_change_mask is causing this issue.

If I unchecked "change_mask" Change Table Header Columns we are able to read the file.

How can we solve that problem keeping header_change_mask column?

eksmirnova · ‎2024-03-21

Posting here because I think that might be helpful for somebody.

We found two options for us:

1) Remove "change_mask" header column from target files:

2) Set internal parameter byteNotFixedLenType to true as described here: Qlik Replicate: header__change_mask column value p... - Qlik Community - 2103612

So we decided to go with option #1 because in our case nobody was using [header__] change_mask column.

View solution in original post

john_wang · ‎2024-03-20

Hello @eksmirnova ,

Thanks for reaching out to Qlik Community!

Are you able to confirm if change_mask values are NULL (or other pattern) caused the error? Anyway, please open a support ticket and attach:

1- Task Diag Packages

2- How you get the error "Could not read or convert schema for file" , what's the command or tools etc.

Thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

eksmirnova · ‎2024-03-20

Some of change_mask values are null, some of them are actual values:

Out assumption that FIXED_LEN_BYTE_ARRAY datatype is causing that issue.

We are getting this error in Data bricks.

spark.read.parquet("abfss://PATH/TABLE_NAME_ct/20240319-150053139.snappy.parquet")

DesmondWOO · ‎2024-03-20

Hi @eksmirnova ,

change_mask values are NULL because those are BEFORE-IMAGE records. It is normal.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

eksmirnova · ‎2024-03-21

@DesmondWOO yes, I know. This is not the issue.

The issue is Databricks is not able to read _CT parquet file because of change_mask column datatype. If I unchecked "change_mask" header column we are able to read the file.

eksmirnova · ‎2024-03-21

Posting here because I think that might be helpful for somebody.

We found two options for us:

1) Remove "change_mask" header column from target files:

2) Set internal parameter byteNotFixedLenType to true as described here: Qlik Replicate: header__change_mask column value p... - Qlik Community - 2103612

So we decided to go with option #1 because in our case nobody was using [header__] change_mask column.

Error reading _CT parquet files which are created by Qlik in Azure ALDS Gen2 target

Errors - Unexpected Behavior