Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Streamlining user types in Qlik Cloud capacity-based subscriptions: Read the Details
cancel
Showing results for 
Search instead for 
Did you mean: 
bmatt
Contributor
Contributor

JSON Deserialization in Big Data Batch Jobs from MongoDBInput

Hello! I am working in a Big Data Batch job and I have Objects that are being read in from a tMongoDBInput component. They are JSON objects, and I want to be able to access them as strings to use tExtractJSON later in my pipeline (from my understanding this works on strings). When I read in the input, they are serialized as java.lang.Object@<serial_number> and I cannot figure out how to get the actual JSON string out of these objects.

If it helps, I am running a Spark instance 3.5x and I'm running Talend Studio 2025-03.

Labels (3)
1 Reply
gouravdubey5
Partner - Creator
Partner - Creator

Hello,

In Big Data Batch jobs, tMongoDBInput does not automatically deserialize MongoDB documents into individual Talend columns.

The component returns the document as a JSON string or serialized structure, and nested fields must be parsed explicitly.

Recommended approach:

Read the MongoDB document as a single JSON/Document field.

Use tExtractJSONFields, tMap, or custom Java logic to deserialize and extract required fields.

Define the output schema based on the extracted values.

Note:
Automatic schema mapping of nested JSON is not supported in Big Data Batch jobs; JSON parsing must be handled explicitly.

Thanks,

Gourav

Talend Solution Architect | Data Integration