Using Schema Registry option in tKafka components with Talend Data Integration jobs

TalendSolutionExpert · Oct 25, 2023 5:39:45 PM

This article explains the steps to create two Talend Data Integration jobs. One DI job as a Kafka producer and another job as a Kafka Consumer.
Both the jobs use the option 'Schema Registry' in tKafka components.

According to Confluent documentation, Schema Registry (or Confluent Schema Registry) "provides a centralized repository for managing and validating schemas for topic message data". Refer to Confluent documentation for more information about Confluent Schema Registry.

This article is based on the Talend documentation for Kafka scenarios, section titled - Reading and writing Avro data from Kafka using Standard Jobs.

Creating a Kafka topic using Confluent Control Center UI.

It is possible to create Kafka topics programmatically or with command-line options.

In this article example scenario, we are using Confluent Control Center UI to create Kafka topics.
Create a topic called 'demo' using 'Add topic' option. Refer screenshot below.

Set a schema for 'value' within the topic 'demo'. Refer screenshot below.

Below is the sample schema used in this example topic.

{
  "type": "record",
  "namespace": "com.mycorp.mynamespace",
  "name": "sampleRecord",
  "doc": "Sample schema to help you get started.",
  "fields": [
    {
      "name": "id",
      "type": "string"     
    },
    {
      "name": "amount",
      "type": "string"
    }
  ]
}

For detailed examples about schema registry, refer to tutorial on Confluent documentation.

Creating Kafka Producer job.

After creating the topic, lets create a Talend DI job for producer that will publish messages to 'demo' topic.

Sample producer job includes tFixedFlowInput component to write values to ProducerRecord object, tJavaRow that creates the ProducerRecord object from the values defined in tFixedFlowInput, and finally tKafkaOutput component to publish the ProducerRecord to topic 'demo'. Refer to screenshots below.

tKafkaOutput component - Basic settings.

tKafkaOutput component - Advanced Settings.

tJavaRow - ProducerRecord.

For more detailed explanation about the job, refer to Talend documentation (Reading and writing Avro data from Kafka using Standard Jobs).

Creating Kafka Consumer job.

After creating a DI job for producer, now lets create a Talend DI job for consumer.

Consumer DI job consists of tKafkaInput, tJavaRow, tLogRow components.

tKafkaInput reads (or consumes) messages from 'demo' topic. Refer screenshot below for basic and advanced settings of tKafkaInput.

Since the property 'Stop after receiving a maximum number of messages' is set to 1, this sample job reads only one record.

tJavaRow is used to get the values from ConsumerRecord object (output of tKafkaInput). Refer screenshot below.

tLogRow prints the values.

Running the jobs in Talend Studio.

Below screenshots display the producer and consumer job execution results.