Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
AgusBudianto
Contributor III
Contributor III

How to transformation column auto increment in target data Databricks Lakehouse (Delta)

How to transformation column auto increment in target data Databricks Lakehouse (Delta)
 
 

Hello,

I am looking for some help related to transformations.  So i want to add more column in target data for field let say "RECORD_ID" and in this field i want to use Auto Increment so if there any insert in Source Data, the Target Data will count 1,2,3 etc

we have tried to add identity column on Azure SQL Server with configuration on column RECORD_ID BIGINT IDENTITY(1,1) NOT NULL, and it works as expected

but we have target table using Azure Databricks Lakehouse (Delta), with configuration column RECORD_ID NOT NULL GENERATED BY DEFAULT AS IDENTITY (START WITH 1 INCREMENT BY 1) the result fails with error message "[DELTA_NOT_NULL_CONSTRAINT_VIOLATED] NOT NULL constraint violation for column: RECORD_ID."

 

previously I asked in the following post: https://community.qlik.com/t5/Qlik-Replicate/How-to-transformation-column-auto-increment-in-target-d...

 

Thanks you

Anto

1 Solution

Accepted Solutions
john_wang
Support
Support

Hello Anto, @AgusBudianto 

Thanks for the update.

It seems that using the current timestamp is the best approach to identify the latest updated rows. In Databricks, a column defined as GENERATED BY DEFAULT AS IDENTITY cannot be directly updated for existing data. Additionally, it cannot be modified when using the LOAD method to insert records into a table. However, it can be updated through INSERT operations, but Qlik Replicate does not support transactional apply mode for Databricks endpoint.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

8 Replies
OritA
Support
Support

Hi, 

The recommendation provided in your previous community entry are still in effect. What is the number of the case that you have opened? Please add the log with the error that you get to the case so we can see exactly when he error is manifested and help troubleshooting the problem. 

 

Regards,

Orit 

 

DesmondWOO
Support
Support

Hi @AgusBudianto ,

As value of IDENTITY column will be generated by the database, target table should be created manually and the RECORD_ID column should not be mapped in the transformation page. 

I can insert records to the target table, however, RECORD_ID is still NULL. I think it is a limitation. Please create a support ticket and let us check with the R&D team.

Regards,
Desmond 

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
AgusBudianto
Contributor III
Contributor III
Author

Hi @OritA  and @DesmondWOO 

We have created a ticket to Qlik Support and the result we got is that there is indeed a limitation for Azure Databrick Delta table targets and they recommend using randomblob(N) or random() function.

Please note that we use record_id to detect the last update, for example if 1 ID updates data several times, then we will take the last data based on the last record_id or sorted by record_id.

However, we found a new problem on the SQL Server target that previously had no problem adding record_id, after we checked the contents of the record_id became random, for example we did an insert on ID 123 it should be record_id 1 then we did an update on ID 123 it should be record_id 2 but the conditions were reversed this made us unable to use it in capturing changes, maybe we use the "Batch optimized apply" method on the task

Finally we replaced the record_id with the current time in the hope of being able to take the last time

 

Regards

Anto

john_wang
Support
Support

Hello Anto, @AgusBudianto 

Thanks for the update.

It seems that using the current timestamp is the best approach to identify the latest updated rows. In Databricks, a column defined as GENERATED BY DEFAULT AS IDENTITY cannot be directly updated for existing data. Additionally, it cannot be modified when using the LOAD method to insert records into a table. However, it can be updated through INSERT operations, but Qlik Replicate does not support transactional apply mode for Databricks endpoint.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
AgusBudianto
Contributor III
Contributor III
Author

Thanks for the response and explanation @DesmondWOO  and @OritA 

Thanks @john_wang for convincing us that the method we used before is the best way, we have a CDC task using the Store Changes Processing mechanism and adding Record_ID to the Global Rule parameter "Add a column for column data type %.% 'RECORD_ID' STRING(50) and sub type Regular with $AR_H_TIMESTAMP" to retrieve the last update data based on 'RECORD_ID'

 

Regards,

Anto

john_wang
Support
Support

Hello Anto, @AgusBudianto , copy @OritA , @DesmondWOO 

Thank you so much for the feedback. It's helpful for all of us!

Best Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
bantuh2
Contributor
Contributor

Hi @john_wang I am interested in this topic,

During CDC, Qlik replicate should add incremented sequence number at the time each record insert into the table. Please suggest how we can achieve this. by the way we are replicating from DB2 for LUW to AlloyDB. We have to generate a unique primary key for the target AlloyDB table.

The standard AUTO_INCREMENT feature in AlloyDB is not a good option because it automatically applies a NOT NULL constraint, which is causing our replication to fail.

 

can you suggest on this?

john_wang
Support
Support

Hello @bantuh2 ,

For AlloyDB/PostgreSQL database, column SERIAL should be used. Please check my comments in article load target table with new identity column (GENERATED BY DEFAULT).

Hope it helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!