Skip to main content

Suggest an Idea

Vote for your favorite Qlik product ideas and add your own suggestions.

Announcements
NEW Customer Portal: Initial launch will improve how you submit Support Cases. FIND OUT MORE

Kafka partitioning by transaction ID

padraigdebuitlear
Contributor III
Contributor III

Kafka partitioning by transaction ID

We are looking to implement CDC from Oracle to Kafka. E.g. a CQRS/Event Sourcing type pattern.

I was reading the "Apache Kafka® Transaction Data Streaming" book.

https://www.qlik.com/us/-/media/files/resource-library/global-us/register/ebooks/eb-apache-kafka-tra...

In the book, on page 26, Table 4-2, it suggests a Kafka partition option :"Partition by transaction ID".
My first question is as follows. As this is a Qlik/Confluent book, I would expect this to be a supported option in Qlik Replicate. Is "Partitioning by transaction ID" supported by Qlik? I don't see the capability in the product documentation.
Can this be put on the backlog as a change request please?
https://help.qlik.com/en-US/replicate/Content/Replicate/April%202020/Setup_User_Guide.pdf


Other questions:

  • On a refresh/reload, is there transaction detail in the messages? Is it the same transaction ID on all messages

 

  • Again, with regard to multiple table writes within a single transaction, does Qlik Replicate give any guarantees about the ordering/sequence of messages put into Kafka.

 

  • Does Qlik Replicate have the ability to allow a consumer know the boundaries of transactions in the events that are produced? similar to Debezium's Transaction marker functionality?

3 Comments
John_Teichman
Former Employee
Former Employee

Hello @padraigdebuitlear,

Thanks for your idea submissions. Your submission contains two ideas. Do you mind splitting this into multiple ideas? When there are multiple ideas in a single post the review and tracking process is slowed down. Please refer to the submission guidelines for more information.

Ola_Mayer
Employee
Employee

Can this be put on the backlog as a change request please?

We do have the "Partition by transaction ID" capability using a virtual column $partition, where you can define an expression that may include the header field AR_H_TRANSACTION_ID. This expression may look like this: ifnull($AR_H_TRANSACTION_ID, ‘FullLoadNoTransactionID’)”. But we do not actually recommend that, since the consumers, in order to get the right order of the transactions (not the order in the transaction), have to coordinate between the partitions: wait for the other transactions/partitions to end their events, which eventually come to the conclusion that one partition is the best way, but now we will definitely pay off in low performance…

Can you please provide a use case and describe how you intend to consume it?

  • On a refresh/reload, is there transaction detail in the messages? Is it the same transaction ID on all messages
    Yes – this is the same Transaction ID (empty string) for all messages when refresh/reload, which is why we think this option is not the preferred one for Full Load and CDC together in the same Replicate task.

 

  • Again, with regard to multiple table writes within a single transaction, does Qlik Replicate give any guarantees about the ordering/sequence of messages put into Kafka.
    This is not a Qlik Replicate guarantee but a Kafka guarantee. We use librdkafka for producing our messages and it can guarantee your order/sequence of messages reproduced into Kafka only if you write it to a single partition, a single topic, and if it produces one batch at a time. The problem with this is that the performance becomes very low.

 

  • Does Qlik Replicate have the ability to allow a consumer know the boundaries of transactions in the events that are produced? similar to Debezium's Transaction marker functionality?
    Yes – the Replicate Kafka messages have a sequence number ("transactionEventCounter" = n) and also the last message in the transaction ("transactionLastEvent" = true).
Status changed to: Closed - Already Available