Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We are converting from debezium to qlik replicate. Two fields debezium give us are:
"__source_change_lsn": {
"string": "008da661:000ec9ed:0002"
},
"__source_commit_lsn": {
"string": "008da661:000ec9ed:002b"
},
It appears that one of these represents the Qlik message field "streamPosition", but we are having trouble getting the other.
Hello @rsbothwell ,
Thanks for choosing Qlik Replicate.
From the LSN format I'm supposing your source is SQL Server, and your target is Kafka. We are discussing your question based on above endpoints.
In Qlik Replicate there is not the exact same two fields like Debezium although streamPosition is a similar one. In Replicate header fields there are another 2 similar fields TIMESTAMP and COMMIT_TIMESTAMP which are UTC format timestamp of the source change events and the transaction commit timestamp. In general the LSN and the UTC timestamp can be converted to each other as they have 1 to 1 mapping releationship, however it's not recommended to do such conversion in Kafka solution.
So the most important thing is why you need the 2 fields? if you want to sort the source change events then in Replicate there are 2 options:
1. One source tables changes write to a single given partition of a topic - Kafka guarantee the order of the changes whithin the partition however it's not the best approach in performance point of view.
2. Your app to sort the changes by streamPosition and OPERATION fields while consuming the messages.
Hope this helps.
John.
Hi John, @john_wang , thanks for the reply. Debezium can emit duplicate messages. We are using the _lsn's to merge duplicates. We were looking at using transactionId as a substitute for commit_lsn. From what you are saying we can use TIMESTAMP? We don't have COMMIT_TIMESTAMP in our metadata:
"fields": [{
"name": "operation",
"type": {
"type": "enum",
"name": "operation",
"symbols": ["INSERT", "UPDATE", "DELETE", "REFRESH"]
}
}, {
"name": "changeSequence",
"type": "string"
}, {
"name": "timestamp",
"type": "string"
}, {
"name": "streamPosition",
"type": "string"
}, {
"name": "transactionId",
"type": "string"
}, {
"name": "changeMask",
"type": ["null", "bytes"],
"default": null
}, {
"name": "columnMask",
"type": ["null", "bytes"],
"default": null
}, {
"name": "transactionEventCounter",
"type": ["null", "long"],
"default": null
}, {
"name": "transactionLastEvent",
"type": ["null", "boolean"],
"default": null
}
]
}
Hello @rsbothwell ,
You can use streamPosition to eliminate duplicate messages. I'm sorry for the delay.
Regards,
John.
Hi John,
I hope you are doing well. Is there a recommended part of the Streamposition value on which to use to sort the records in sequence?
For example:
3332;637844887646504050;20210612080418483693|00000005.8058fbb7.00000001.0002.01.0000:184334.179713.16
I know that the SCN can be obtained by converting the hexadecimal value of "000000058058fbb7" into a decimal, then sorting. However this might be a bit inconvenient for use as it needs to be converted to decimal first.
I noticed that the 2nd part of the column looks to be unique and in the correct sequence: 637844887646504050. Can this be used for sequencing the correct order of changes as they occurred exactly on the source? If not, which part of the streamposition column does Qlik recommend to use for sorting? Thank you.
Regards,
Mohammed