JSON Deserialization in Big Data Batch Jobs from M... - Qlik Community

bmatt · ‎2025-04-16

Hello! I am working in a Big Data Batch job and I have Objects that are being read in from a tMongoDBInput component. They are JSON objects, and I want to be able to access them as strings to use tExtractJSON later in my pipeline (from my understanding this works on strings). When I read in the input, they are serialized as java.lang.Object@<serial_number> and I cannot figure out how to get the actual JSON string out of these objects.

If it helps, I am running a Spark instance 3.5x and I'm running Talend Studio 2025-03.

gouravdubey5

Hello,

In Big Data Batch jobs, tMongoDBInput does not automatically deserialize MongoDB documents into individual Talend columns.

The component returns the document as a JSON string or serialized structure, and nested fields must be parsed explicitly.

Recommended approach:

Read the MongoDB document as a single JSON/Document field.

Use tExtractJSONFields, tMap, or custom Java logic to deserialize and extract required fields.

Define the output schema based on the extracted values.

Note:
Automatic schema mapping of nested JSON is not supported in Big Data Batch jobs; JSON parsing must be handled explicitly.

Thanks,

Gourav

Talend Solution Architect | Data Integration

JSON Deserialization in Big Data Batch Jobs from MongoDBInput

JSON

Other

Studio