
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kafka Partition strategy by primary key
Hello,
I would like to have information when partitioning for kafka.
when choosing the Partition strategy:
- By message key
and Message key:
- Primary key columns
In the case of an update, the primary key values used are :
1) after modification (in the data object)
2)before modification (in the beforedata object)
and is it possible to choose between the 2 possibilities
Thank you in advance for your answer
yves savean
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Yves_s ,
Well, you can use below steps to control the messages go to a desired partition:
1. In the table setting, add a new column, its name is "$partition", type is default string(50), as
2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.
In your example, if you want to use "COL1" as the partition divider, then the expression should be:
ifnull($BI__COL1,$COL1)
Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.
BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1.
Hope this helps.
Regards,
John.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Yves_s ,
I'm not very sure if I understood the question exactly.
Are you trying to use "Primary key columns" value to decide which partition a message go ? You are asking if the partition number can be decide by the Primary Key BeforeImage value, or the Primary Key AfterImage value? if this is your question then we have options. If not, then please let me know what's the exact question, a sample is helpful I think.
Regards,
John.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When we write an avro message in the topic, we get a message like:
{
"type": "record",
"name": "DataRecord",
"namespace": "com.attunity.queue.msg.TABLE",
"fields": [{
"name": "data",
"type": {
"type": "record",
"name": "Data",
"fields": [{
"name": "COL1",
"type": ["null", "string"],
"default": null
}, {
"name": "COL2",
"type": ["null", {
"type": "bytes",
"precision": 5,
"scale": 0,
"logicalType": "decimal"
}],
"default": null
}]
}
}, {
"name": "beforeData",
"type": ["null", "Data"],
"default": null
}, {
"name": "headers",
"type": {
"type": "record",
"name": "Headers",
"namespace": "com.attunity.queue.msg",
"fields": [{
"name": "operation",
"type": {
"type": "enum",
"name": "operation",
"symbols": ["INSERT", "UPDATE", "DELETE", "REFRESH"]
}
}, {
"name": "changeSequence",
"type": "string"
}, {
"name": "timestamp",
"type": "string"
}, {
"name": "streamPosition",
"type": "string"
}, {
"name": "transactionId",
"type": "string"
}, {
"name": "changeMask",
"type": ["null", "bytes"],
"default": null
}, {
"name": "columnMask",
"ty": ["null", "bytes"],
"default": null
}, {
"name": "transactionEventCounter",
"type": ["null", "long"],
"default": null
}, {
"name": "transactionLastEvent",
"type": ["null", "boolean"],
"default": null
}]
}
}]
}
the "beforeData" section is complete in the case of an update.
my question is:
when we partition by primary key , in this exemple the primary key is COL1.
In insertion and deletion the value used is the value found in the "data" section
But in update the value used is found in the "beforeData" section ? Or is always the value found in the "data" section?
I think in case it is not a real primary key
1) insert was launch
COL1 = 1 in data.COL1 -> partition 1 of topic
2) update was launch
COL1=2(data.COL1) ->(other partition) but in beforeData.COL1= 1 -> partition 1 of topic
i hope i was clear

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Yves_s ,
Excuse me I still did not learn it exactly. Let's change a way to confirm the question.
Assuming you are using the PK (Primary Key column in source table, in your sample it's "COL1") as the partition number in Kafka. For Insert/Delete operations the messages go to the correct partition, the UPDATE messages go to the correct partition too if the PK value was not changed. However sometimes the PK was changed (in scenario 3 below):
- 1. in Full Load, the row goes to partition "1" (because of "COL1" value is "1")
- 2. In CDC Update where PK is NOT changed, the row goes to partition "1" (because of "COL1" value is "1")
- 3. In CDC Update is PK changed, the row goes to partition "2". (because of "COL1" value is "2")
(assume you updated the PK like: "update tbl set COL1= 2 where COL1= 1")
By default the update operation message in scenario (3) will go to the partition "2", however you want it goes to partition "1" still. Let me know if is this what you are looking for?
thank you,
John.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's exactly that. how i can keep partition "1" if the "fake primary key" changes

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Yves_s ,
Well, you can use below steps to control the messages go to a desired partition:
1. In the table setting, add a new column, its name is "$partition", type is default string(50), as
2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.
In your example, if you want to use "COL1" as the partition divider, then the expression should be:
ifnull($BI__COL1,$COL1)
Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.
BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1.
Hope this helps.
Regards,
John.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it's perfect thanks
