Designing a Talend DI job to map a huge JSON input... - Qlik Community

adbdkb · ‎2019-12-06

I need to design a job that maps Huge JSON input to another JSON format. The input JSON has nested objects that have nested arrays and I need to map the data to output that contains some attributes from within the nested objects and arrays, and also have to map some attribute values in these objects from another json file(s).

This is just one job - once I get this working, I have to design many more such jobs

I am very new to Talend and trying to learn the tJSONDoc at the same time.

I want to design the first job correctly, so that the design of other jobs will mostly involve correct mapping.

@lli- Could you help me with pointers on how do I start my design.using tJSONDoc component? I would really appreciate some guidance on properly using the components. When I tried it by following some example I had found, even before getting to the arrays, I was not able to get the nested objects properly mapped in the output. Everything was at the top level.

I will try to sanitise my input / output jsons and attach them here, if that will help.

Thanks

AB

adbdkb · ‎2019-12-06

Adding the input and output json formats

Talend-Input-Output.zip

Anonymous · ‎2019-12-07

The first think you should do is to split this huge file into a lot smaller ones and process theses smaller files - perhaps in parallel.

You can split huge json files with the help of the tJSONDocInputStream component and write the content row by row in smaller files.

adbdkb · ‎2019-12-07

But this is one record - not multiple records. And This is how I will get the input record. And this one record will need to be transformed into one output record, samples of both of which, I have attached.

So, how would I generically split the file into smaller ones in the job, process each of them and then combine the outputs to create a single output record?

Thanks for any help you can provide.

Anonymous · ‎2019-12-07

Strange file. A file with a none splittable content and the output will be one record and it is such huge you get an out of memory error? Really?

I would love to see this file and your job design! I have never seen such scenario.

Does the file contains fields with huge content?

adbdkb · ‎2019-12-07

I have attached the input file to this query. It has a lot of attributes and arrays of objects with many attributes - that is what makes the file huge. As I mentioned, I just started learning Talend last week, so I do not have a job design and it was suggested to me that I should use tJSONDoc component. That is why I sought guidance on the forum to learn how should I design the job, so I do not have a design yet, but the file is attached.

Thanks

Designing a Talend DI job to map a huge JSON input to another equally big output using tJSONDoc

Talend Data Integration

v7.x