Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
eksmirnova
Contributor III
Contributor III

Error reading _CT parquet files which are created by Qlik in Azure ALDS Gen2 target

Hello,

We are replicating data from DB2 to ALDS Gen2 container. Our set up is DB2 -> LogStream -> ADLS

Full load +Store changes with Parquet file format. 

We are able to read Full load files without any issues. But getting an error while reading _CT files:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11232.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11232.0 (TID 76655) (10.241.17.215 executor 19): java.io.IOException: Could not read or convert schema for file: abfss://PATH/TABLE_NAME_ct/20240320-145022523.snappy.parquet

The column header_change_mask is causing this issue.

If I unchecked "change_mask" Change Table Header Columns we are able to read the file.

How can we solve that problem keeping header_change_mask column?

Labels (1)
1 Solution

Accepted Solutions
eksmirnova
Contributor III
Contributor III
Author

Posting here because I think that might be helpful for somebody.

We found two options for us:

1) Remove "change_mask" header column from target files:

eksmirnova_0-1711055682212.png

2) Set internal parameter byteNotFixedLenType to true as described here: Qlik Replicate: header__change_mask column value p... - Qlik Community - 2103612

So we decided to go with option #1 because in our case nobody was using  [header__] change_mask column.

View solution in original post

5 Replies
john_wang
Support
Support

Hello @eksmirnova ,

Thanks for reaching out to Qlik Community!

Are you able to confirm if change_mask values are NULL (or other pattern) caused the error? Anyway, please open a support ticket and attach:

1- Task Diag Packages

2- How you get the error "Could not read or convert schema for file" , what's the command or tools etc.

Thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
eksmirnova
Contributor III
Contributor III
Author

Some of change_mask values are null, some of them are actual values:

eksmirnova_0-1710952172864.png

 

Out assumption that FIXED_LEN_BYTE_ARRAY datatype is causing that issue.

We are getting this error in Data bricks. 

spark.read.parquet("abfss://PATH/TABLE_NAME_ct/20240319-150053139.snappy.parquet")

eksmirnova_1-1710952389151.png

 

DesmondWOO
Support
Support

Hi @eksmirnova ,

change_mask values are NULL because those are BEFORE-IMAGE records. It is normal.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
eksmirnova
Contributor III
Contributor III
Author

@DesmondWOO yes, I know. This is not the issue.

The issue is Databricks is not able to read _CT parquet file because of change_mask column datatype. If I unchecked "change_mask" header column we are able to read the file.

 

eksmirnova
Contributor III
Contributor III
Author

Posting here because I think that might be helpful for somebody.

We found two options for us:

1) Remove "change_mask" header column from target files:

eksmirnova_0-1711055682212.png

2) Set internal parameter byteNotFixedLenType to true as described here: Qlik Replicate: header__change_mask column value p... - Qlik Community - 2103612

So we decided to go with option #1 because in our case nobody was using  [header__] change_mask column.