Skip to main content

Suggest an Idea

Vote for your favorite Qlik product ideas and add your own suggestions.

Option to group records that form part of a single transaction when sending to Kafka

padraigdebuitlear
Contributor III
Contributor III

Option to group records that form part of a single transaction when sending to Kafka

As suggested, I am breaking out my previous idea out into seperate Ideas.

 

Use case: We want to do CDC from a relational database to Kafka. We want to create a domain level event from the writes (multiple tables) that happen as part of a single database transaction. As a result we need to know the transaction boundaries and the writes that happened as part of that transaction.

  • In order to put in place an implementation for the above scenario, we need to capture all records that belong to a single transaction. In order to do so using the current functionality in 'Qlik replicate' requires complex aggregation, potentially repartitioning and the use of 'windows , E.g. Session Windows' in Kafka, in order to capture all records that belong to a single transaction. Writing to a buffer, as suggested by the docs is not seen as being efficient.

The implications are as follows:

What would make life easier, is if it was possible to have an option in 'Qlik Replicate' to send ALL (selected) records that form part of a single transaction in the same payload to Kafka. My understanding from the documentation is that the product already groups and sorts the records, if we could have the option to receive the entire transaction in a single payload, that would greatly simplify the implementation for the consumer.

6 Comments
Ola_Mayer
Employee
Employee

Replicate already provides a solution for managing transaction consistency management solution when working with Kafka.

The way Replicate works with streaming end points we send events (not transactions) in batches. These batches are filled up to 500 records which is the batch size limit. Every time we get a new event, we tag the event as required (transaction id, ordinal number within the transaction, last event in transaction flag) and push it to the currently opened batch. If the batch is full, the batch is sent, and the next event will go to the next batch.

You can find additional information and recommendations in our admin guide.

Status changed to: Closed - Already Available
padraigdebuitlear
Contributor III
Contributor III

@Ola_Mayer  

See page 708 under limitations:

"Batch optimized apply mode is not supported. If this mode is set, the task will
automatically switch to Transactional apply mode and issue an appropriate
warning.
For more information on these modes, see Change Processing Tuning."

https://help.qlik.com/en-US/replicate/Content/Replicate/April%202020/Setup_User_Guide.pdf

 

 

You seem a bit too eager to close off the idea!!!!!!!!!!

Am I missing something here?

padraigdebuitlear
Contributor III
Contributor III

I think what I was asking for was misunderstood. I'm looking for a single event/payload for each unique transaction. That functionality is not already available as far as I can see. That single event would include all the records that make up that transaction. Please read my original post above.

Ola_Mayer
Employee
Employee

Unfortunately what you are asking for is impossible to implement in current solution for Kafka target.

We have no way to know when transaction will actually begin and end. The environment might have a mixture of very long and short running transactions. In fact, we have seen instances when transaction generated events for more then 24 hours. The analysis of where transaction begins and end  must happen post delivery, after all events are collected at the target.

I believe that the best approach to your request is for Replicate to implement Kafka as a source, and take care of combining all events in a transaction on the other end of the process.

But this is a different future   request that you are welcome to submit.

padraigdebuitlear
Contributor III
Contributor III

Does your comment above not contradict the documentation:

"During a task's CDC stage, committed changes that are detected by the Qlik Replicate
source endpoint are grouped by transaction, sorted internally in chronological order, and
then propagated to the target endpoint"

What does the transactionLastEvent field indicate,  does it not indicate the end of a transaction?

 

Ola_Mayer
Employee
Employee

The events are grouped by transaction, but it doesn't grantee that all transaction events are included in a single group. Boundaries of the group are dictated but size of the buffer and time