Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I would like to have information when partitioning for kafka.
when choosing the Partition strategy:
- By message key
and Message key:
- Primary key columns
In the case of an update, the primary key values used are :
1) after modification (in the data object)
2)before modification (in the beforedata object)
and is it possible to choose between the 2 possibilities
Thank you in advance for your answer
yves savean
Hello @Yves_s ,
Well, you can use below steps to control the messages go to a desired partition:
1. In the table setting, add a new column, its name is "$partition", type is default string(50), as
2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.
In your example, if you want to use "COL1" as the partition divider, then the expression should be:
ifnull($BI__COL1,$COL1)Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.
BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1.
Hope this helps.
Regards,
John.
Hello @Yves_s ,
I'm not very sure if I understood the question exactly.
Are you trying to use "Primary key columns" value to decide which partition a message go ? You are asking if the partition number can be decide by the Primary Key BeforeImage value, or the Primary Key AfterImage value? if this is your question then we have options. If not, then please let me know what's the exact question, a sample is helpful I think.
Regards,
John.
When we write an avro message in the topic, we get a message like:
{
"type": "record",
"name": "DataRecord",
"namespace": "com.attunity.queue.msg.TABLE",
"fields": [{
"name": "data",
"type": {
"type": "record",
"name": "Data",
"fields": [{
"name": "COL1",
"type": ["null", "string"],
"default": null
}, {
"name": "COL2",
"type": ["null", {
"type": "bytes",
"precision": 5,
"scale": 0,
"logicalType": "decimal"
}],
"default": null
}]
}
}, {
"name": "beforeData",
"type": ["null", "Data"],
"default": null
}, {
"name": "headers",
"type": {
"type": "record",
"name": "Headers",
"namespace": "com.attunity.queue.msg",
"fields": [{
"name": "operation",
"type": {
"type": "enum",
"name": "operation",
"symbols": ["INSERT", "UPDATE", "DELETE", "REFRESH"]
}
}, {
"name": "changeSequence",
"type": "string"
}, {
"name": "timestamp",
"type": "string"
}, {
"name": "streamPosition",
"type": "string"
}, {
"name": "transactionId",
"type": "string"
}, {
"name": "changeMask",
"type": ["null", "bytes"],
"default": null
}, {
"name": "columnMask",
"ty": ["null", "bytes"],
"default": null
}, {
"name": "transactionEventCounter",
"type": ["null", "long"],
"default": null
}, {
"name": "transactionLastEvent",
"type": ["null", "boolean"],
"default": null
}]
}
}]
}
the "beforeData" section is complete in the case of an update.
my question is:
when we partition by primary key , in this exemple the primary key is COL1.
In insertion and deletion the value used is the value found in the "data" section
But in update the value used is found in the "beforeData" section ? Or is always the value found in the "data" section?
I think in case it is not a real primary key
1) insert was launch
COL1 = 1 in data.COL1 -> partition 1 of topic
2) update was launch
COL1=2(data.COL1) ->(other partition) but in beforeData.COL1= 1 -> partition 1 of topic
i hope i was clear
Hello @Yves_s ,
Excuse me I still did not learn it exactly. Let's change a way to confirm the question.
Assuming you are using the PK (Primary Key column in source table, in your sample it's "COL1") as the partition number in Kafka. For Insert/Delete operations the messages go to the correct partition, the UPDATE messages go to the correct partition too if the PK value was not changed. However sometimes the PK was changed (in scenario 3 below):
By default the update operation message in scenario (3) will go to the partition "2", however you want it goes to partition "1" still. Let me know if is this what you are looking for?
thank you,
John.
It's exactly that. how i can keep partition "1" if the "fake primary key" changes
Hello @Yves_s ,
Well, you can use below steps to control the messages go to a desired partition:
1. In the table setting, add a new column, its name is "$partition", type is default string(50), as
2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.
In your example, if you want to use "COL1" as the partition divider, then the expression should be:
ifnull($BI__COL1,$COL1)Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.
BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1.
Hope this helps.
Regards,
John.
it's perfect thanks