When there is a change to PK fields on the source system, Replicate sends to Kafka the previous PK data as part of the envelope for the update message. However, the update message has the new PK data as key. The result is that it can go to a different Kafka partition than the old message. This can impact to downstream consumers since there are no ordering guarantees for messages on different partitions.
The idea is to send to Kafka a tombstone record for keys that no longer exist on the source system, not just to add the previous PK as metadata in the message envelope on a completely different key (the “beforeData” for each message).
Here is a use case: suppose there is an Email table in the source database where primary key fields are PersonId and EmailType. Suppose there is a record where PersonId=1 and EmailType=Office. Last, suppose someone were to UPDATE that record to have PersonId=1 and EmailType=Personal.
Current behavior:
Partition A:
Message1: Key - { “personId”: 1, “emailType”: “office” } / Value – { “address”: something@domain.com , “beforeData”: null}
Partition B:
Message1: Key - { “personId”: 1, “emailType”: “personal” } / Value – { “address”: something@domain.com , “beforeData”: { “personId”: 1, “emailType”: “office” }
In current behavior, the consumer is required to look at the “beforeData” for each message and create a tombstone record, with a potential processing for the original key from the new partition (B). This adds complexity on the consuming side.
Tomb-stoning methodology:
Partition A:
Message1: Key - { “personId”: 1, “emailType”: “office” } / Value – { “address”: something@domain.com }
Message2: Key - { “personId”: 1, “emailType”: “office” } / Value – null
Partition B:
Message1: Key - { “personId”: 1, “emailType”: “personal” } / Value – { “address”: something@domain.com }
With the tomb-stoning methodology, Message2 would be the tombstone message for the key { “personId”: 1, “emailType”: “office” }, handled by the producer and avoiding extra complexity on the producing side.