Skip to main content
Announcements
UPGRADE ADVISORY for Qlik Replicate 2024.5: Read More
cancel
Showing results for 
Search instead for 
Did you mean: 
Yves_s
Contributor III

Kafka Partition strategy by primary key

Hello,

I would like to have information when partitioning for kafka.

when choosing the Partition strategy:

- By message key

and Message key:

Primary key columns

In the case of an update, the primary key values ​​used are :

1) after modification (in the data object)

2)before modification (in the beforedata object)

and is it possible to choose between the 2 possibilities

 

Thank you in advance for your answer

yves savean

 

Labels (2)
1 Solution

Accepted Solutions
john_wang
Support

Hello @Yves_s ,

Well, you can use below steps to control the messages go to a desired partition:

1. In the table setting, add a new column, its name is "$partition", type is default string(50), as

    john_wang_0-1645106144780.png

2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.

    john_wang_1-1645106286797.png

    In your example, if you want to use "COL1" as the partition divider, then the expression should be:

ifnull($BI__COL1,$COL1)

    Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.

    BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1. 

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

6 Replies
john_wang
Support

Hello @Yves_s ,

I'm not very sure if I understood the question exactly. 

Are you trying to use "Primary key columns" value to decide which partition a message go ? You are asking if the partition number can be decide by the Primary Key BeforeImage value, or the Primary Key AfterImage value? if this is your question then we have options. If not, then please let me know what's the exact question, a sample is helpful I think.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Yves_s
Contributor III
Author

When we write an avro message in the topic, we get a message like:

{
	"type": "record",
	"name": "DataRecord",
	"namespace": "com.attunity.queue.msg.TABLE",
	"fields": [{
		"name": "data",
		"type": {
			"type": "record",
			"name": "Data",
			"fields": [{
				"name": "COL1",
				"type": ["null", "string"],
				"default": null
			}, {
				"name": "COL2",
				"type": ["null", {
					"type": "bytes",
					"precision": 5,
					"scale": 0,
					"logicalType": "decimal"
				}],
				"default": null
			}]
		}
	}, {
		"name": "beforeData",
		"type": ["null", "Data"],
		"default": null
	}, {
		"name": "headers",
		"type": {
			"type": "record",
			"name": "Headers",
			"namespace": "com.attunity.queue.msg",
			"fields": [{
				"name": "operation",
				"type": {
					"type": "enum",
					"name": "operation",
					"symbols": ["INSERT", "UPDATE", "DELETE", "REFRESH"]
				}
			}, {
				"name": "changeSequence",
				"type": "string"
			}, {
				"name": "timestamp",
				"type": "string"
			}, {
				"name": "streamPosition",
				"type": "string"
			}, {
				"name": "transactionId",
				"type": "string"
			}, {
				"name": "changeMask",
				"type": ["null", "bytes"],
				"default": null
			}, {
				"name": "columnMask",
				"ty": ["null", "bytes"],
				"default": null
			}, {
				"name": "transactionEventCounter",
				"type": ["null", "long"],
				"default": null
			}, {
				"name": "transactionLastEvent",
				"type": ["null", "boolean"],
				"default": null
			}]
		}
	}]
}

 

the "beforeData" section  is complete in the case of an update.

my question is:

when we partition by primary key , in this exemple the primary key is COL1.

In insertion and deletion the value used is the value found in the "data" section

But in update the value used is  found in the "beforeData" section ? Or is always the value found in the "data" section? 

 

I think in case it is not a real primary key

1) insert was launch

COL1 = 1 in data.COL1 -> partition 1 of topic

2) update was launch 

 COL1=2(data.COL1) ->(other partition)  but in beforeData.COL1= 1 -> partition 1 of topic

i hope i was clear

 

 

john_wang
Support

Hello @Yves_s  ,

Excuse me I still did not learn it exactly. Let's change a way to confirm the question.

Assuming you are using the PK (Primary Key column in source table, in your sample it's "COL1") as the partition number in Kafka. For Insert/Delete operations the messages go to the correct partition, the UPDATE messages go to the correct partition too if the PK value was not changed. However sometimes the PK was changed (in scenario 3 below):

  • 1. in Full Load, the row goes to partition "1" (because of "COL1" value is "1")
  • 2. In CDC Update where PK is NOT changed, the row goes to partition "1" (because of "COL1" value is "1")
  • 3. In CDC Update is PK changed, the row goes to partition "2". (because of "COL1" value is "2")
    (assume you updated the PK like: "update tbl set COL1= 2 where COL1= 1")

By default the update operation message in scenario (3) will go to the partition "2", however you want it goes to partition "1" still. Let me know if is this what you are looking for?

thank you,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Yves_s
Contributor III
Author

It's exactly that. how i can keep partition "1" if the "fake primary key" changes

 

 

john_wang
Support

Hello @Yves_s ,

Well, you can use below steps to control the messages go to a desired partition:

1. In the table setting, add a new column, its name is "$partition", type is default string(50), as

    john_wang_0-1645106144780.png

2. Press the "fx" and input the expression as "ifnull($BI__part,$part)", where "part" is the column name which you want to use it to control the partition number.

    john_wang_1-1645106286797.png

    In your example, if you want to use "COL1" as the partition divider, then the expression should be:

ifnull($BI__COL1,$COL1)

    Take note this is a sample only, it can be used for several partitions only (for example the topic have 5 partitions, and the COL1 values are 0~4 only). If you have a lot of partitions then you need modify the expression to fit your env.

    BTW, the "ifnull" is used for Full Load stage. in FL the $BI__COL1 is null, in this case we use the column value $COL1. 

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Yves_s
Contributor III
Author

it's perfect thanks