
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to read mongoDB collection array data using tMongoDBInput and tExtractJSONFields.
The data I want to read is in the new array (dependentQuestionResponse) in the same array (questionResponse) but
it is not working to extract data for array within the array.
I have tried to extract that data but it is worked for only main array.
Is any wrong in my job?.
Please see the attached screenshot also collectionData format.
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sudarshan khandu,
Sorry about the delay in responding here, I have been a little tied up with some other bits and pieces. However, I have quickly knocked up an example of what you could.
So this is a screenshot of the job. I chose to use a tFileInputRaw to load the file in....you are using MongDB, but it shouldn't matter. You will also see 3 tExtractJsonFields components, 2 tHashOutputs and a tHashInput in the second subjob. I will explain those. Finally you will see a tAggregateRow and a tSortRow. The tAggregateRow will be used to condense the data returned at the end (to just the data you want) and the tSortRow is used to simply order it so that you can see it working easily.
I chose a minimal selection of elements to return, you can add to these. I simply chose an element at each level in the nested arrays.
The first tExtractJsonFields component can be seen below. Notice the "Mapping" section. This is configured to have columns for all of the data....even though I am not collecting it all here. In this component I am simply collecting "CommunityId" and the outer array "questionResponse".
The second tExtractJSONFields component has the same mapping fields. But you will notice that "CommunityId" is not populated. This is because the value will simply be passed along from the previous component. That is how empty columns work in the tExtractJSONFields component. If there is a value supplied by the previous component and the field is left empty, it will simply pass the existing value along.
You will notice I select the "title" and the "dependentQuestionResponse" array here.
The next component is a tHashOutput. This is used to collect and store the output from the previous component and then pass it on to the next. The schema of this is exactly the same as the tExtractJSONFields. Notice that append is ticked. This will be linked to the next tHashOutput component. Or rather, it will be linked to this.
Now we have the final tExtractJSONFields component. Notice the same schema and the empty "pass-through" columns in the mapping. This is retrieving the data from your innermost array. I am collecting the "question" field from the innermost array here.
This is the second tHashOutput. Notice the "Link with a tHashOutput" and the "Component list" params here. These link this to the same memory as the first tHashOutput.
At this point all of our data is collected....but rows will be duplicated. Why? Because we have the tHashOutput between what should be a single step in the row processing. So it gets the data from the first two tExtractJSONFields components and then the data from the last one. This will be duplicating some of it with bits and pieces aded. So what we do in this SubJob is return all of that data, then aggregate it. Notice the tHashInput is linked to the first tHashOutput used.
The tAggregateRow component is used to aggregate the data and get rid of the duplicates/extraneous data. I am grouping by "CommunityId" and "title" here. You may have better fields to group by. The one field that may be there or be missing is the "question" field. So to get rid of rows that have "null" AND a value for "question" linked to the same "communityId" and "title", we set this field to the function of Max.
The tSortRow didn't need to be showed. It was simply used to sort the data so that I could show the result below....
4ca44afgfgfgfae3b1a382bfg38cb|Comments|
4ca44afgfgfgfae3b1a382bfg38cb|Navigation|
4ca44afgfgfgfae3b1a382bfg38cb|Preference|
4ca44afgfgfgfae3b1a382bfg38cb|surveys|
7e7ba747d9ad46812eadd87af6af6a0e|Location|locationtype
7e7ba747d9ad46812eadd87af6af6a0e|Question1|

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I understand this issue correctly, this is down to how you are using the tExtractJSONFields component. I believe (I will need to see an example of the whole JSON to give an example) that you will need two tExtractJSONFields for this. The first one to extract the data outside of your outer array and to iterate over the outer array and then another to iterate over the inner array.
If you can post an example of the JSON (without private data), I can work out what it is you will need to do.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Richard Hall,
Thank you for reply!
I have attached the sample collection data. I want to data like in below screenshot format in tLogRow using tmongodbinput.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sudarshan khandu,
Sorry about the delay in responding here, I have been a little tied up with some other bits and pieces. However, I have quickly knocked up an example of what you could.
So this is a screenshot of the job. I chose to use a tFileInputRaw to load the file in....you are using MongDB, but it shouldn't matter. You will also see 3 tExtractJsonFields components, 2 tHashOutputs and a tHashInput in the second subjob. I will explain those. Finally you will see a tAggregateRow and a tSortRow. The tAggregateRow will be used to condense the data returned at the end (to just the data you want) and the tSortRow is used to simply order it so that you can see it working easily.
I chose a minimal selection of elements to return, you can add to these. I simply chose an element at each level in the nested arrays.
The first tExtractJsonFields component can be seen below. Notice the "Mapping" section. This is configured to have columns for all of the data....even though I am not collecting it all here. In this component I am simply collecting "CommunityId" and the outer array "questionResponse".
The second tExtractJSONFields component has the same mapping fields. But you will notice that "CommunityId" is not populated. This is because the value will simply be passed along from the previous component. That is how empty columns work in the tExtractJSONFields component. If there is a value supplied by the previous component and the field is left empty, it will simply pass the existing value along.
You will notice I select the "title" and the "dependentQuestionResponse" array here.
The next component is a tHashOutput. This is used to collect and store the output from the previous component and then pass it on to the next. The schema of this is exactly the same as the tExtractJSONFields. Notice that append is ticked. This will be linked to the next tHashOutput component. Or rather, it will be linked to this.
Now we have the final tExtractJSONFields component. Notice the same schema and the empty "pass-through" columns in the mapping. This is retrieving the data from your innermost array. I am collecting the "question" field from the innermost array here.
This is the second tHashOutput. Notice the "Link with a tHashOutput" and the "Component list" params here. These link this to the same memory as the first tHashOutput.
At this point all of our data is collected....but rows will be duplicated. Why? Because we have the tHashOutput between what should be a single step in the row processing. So it gets the data from the first two tExtractJSONFields components and then the data from the last one. This will be duplicating some of it with bits and pieces aded. So what we do in this SubJob is return all of that data, then aggregate it. Notice the tHashInput is linked to the first tHashOutput used.
The tAggregateRow component is used to aggregate the data and get rid of the duplicates/extraneous data. I am grouping by "CommunityId" and "title" here. You may have better fields to group by. The one field that may be there or be missing is the "question" field. So to get rid of rows that have "null" AND a value for "question" linked to the same "communityId" and "title", we set this field to the function of Max.
The tSortRow didn't need to be showed. It was simply used to sort the data so that I could show the result below....
4ca44afgfgfgfae3b1a382bfg38cb|Comments|
4ca44afgfgfgfae3b1a382bfg38cb|Navigation|
4ca44afgfgfgfae3b1a382bfg38cb|Preference|
4ca44afgfgfgfae3b1a382bfg38cb|surveys|
7e7ba747d9ad46812eadd87af6af6a0e|Location|locationtype
7e7ba747d9ad46812eadd87af6af6a0e|Question1|

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! @Richard Hall

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not a problem. Hope it solved your issue 🙂
