Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
WSyahirah21
Creator
Creator

Consume kafka topic and store it in Hive using Talend Studio

I am currently trying to create an ingestion job workflow using kafka in Talend Studio. The job will read the json data in topic "work" and store into the hive table. My idea is to use the following workflow in Talend:

 

tKafKaInput > tLogRow > tJava > tMap

 

tKafKaInput and tLogRow : Consume the json data in topic "Work"

tJava : Fetch the json data and bring data to tMap

tMap : Structure the data and save into Hive table

0695b00000RkFFVAA3.png 

 

Note : Snippet of json data in Kafka topic outputs from tLogRow_1 is as in attachment (data).

 

The code in tJava to fetch the json data is basically in this line in which its trying to catch "Vers" data from json:

 

String output=((String)globalMap.get("tLogRow_1_OUTPUT"));

JSONObject jsonObject = new JSONObject(output);

System.out.println(jsonObject);

String sourceDBName=(jsonObject.getString("Vers"));

 

However, I received the error as mentioned in attachment (Error).

 

My questions are:

  1. Is my workflow the best practice to ingest topic from json topic to hive table. Or is there any other possible ways to perform this activity using Talend Studio?
  2. If this workflow is correctly designed, how do I modify the java code in tJava component so that it able to capture the json result from tLogRow and bring it forward to tMap for next activity?

 

Any helps if much appreciated, thanks.

 

 

 

Labels (2)
7 Replies
Anonymous
Not applicable

Hi

tExtractJsonField is the best component used to extract data from a Json string. Please try it and let me know if it does not fit your need or you have any questions.

tKafKaInput > tExtractJsonField > tMap

Regards

Shong

WSyahirah21
Creator
Creator
Author

Hi Shong, I have tried using tExtractJsonField as in below configuration:

0695b00000SpUJHAA3.pngHere, I loop the json path RequestHeader to fetch the data in it. However, once I run the job, there is no result from talend.

0695b00000SpUJbAAN.png 

Anonymous
Not applicable

Hi

Set Loop Jsonpath query as "$.RequestHeader" and try again.

 

Regards

Shong

WSyahirah21
Creator
Creator
Author

Hi Shong,

 

Currently I am trying to read the nested json from 2 parent json. As you can see here in image:

0695b00000SpkdSAAR.pngCurrently, I able to read the json data from RequestHeader using tJsonExtractFields component (You can see at the main connection between 2 tJsonExtractFields components).

 

However, the 2nd component I can only do is the "Reject" connection, instead of Main/onComponentOk . Is this possible to read the nested data this way?

 

Or any idea on this. Thanks.

 

Reference I used : https://help.talend.com/r/Eizi~hPs0B4M_mO2ot6_1g/Ao7wb2mUfg1hug8GfwRXLw

Anonymous
Not applicable

Using a tReplicated after tKafkaInput to replicated the data flow so that you can read the json string several times. eg:

tKafkaInput--tReplicated--tExtractJsonField1

***************************** --tExtractJsonField2

WSyahirah21
Creator
Creator
Author

Hi Shong,

 

is there any ways to merge those 2 component in TDF. Tried tUnite and tMaps, however that didnt worked well.

 

0695b00000SpqxJAAR.png

Anonymous
Not applicable

Store the results to thashOutput, read the data back from memory using tHashInput and merge the data in next subjob.