Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello! I am working in a Big Data Batch job and I have Objects that are being read in from a tMongoDBInput component. They are JSON objects, and I want to be able to access them as strings to use tExtractJSON later in my pipeline (from my understanding this works on strings). When I read in the input, they are serialized as java.lang.Object@<serial_number> and I cannot figure out how to get the actual JSON string out of these objects.
If it helps, I am running a Spark instance 3.5x and I'm running Talend Studio 2025-03.
Hello,
In Big Data Batch jobs, tMongoDBInput does not automatically deserialize MongoDB documents into individual Talend columns.
The component returns the document as a JSON string or serialized structure, and nested fields must be parsed explicitly.
Recommended approach:
Read the MongoDB document as a single JSON/Document field.
Use tExtractJSONFields, tMap, or custom Java logic to deserialize and extract required fields.
Define the output schema based on the extracted values.
Note:
Automatic schema mapping of nested JSON is not supported in Big Data Batch jobs; JSON parsing must be handled explicitly.
Thanks,
Gourav