Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
sudarshank
Contributor III

How to read mongoDB collection array data using tMongoDBInput and tExtractJSONFields.

The data I want to read is in the new array (dependentQuestionResponse) in the same array (questionResponse) but

it is not working to extract data for array within the array.

I have tried to extract that data but it is worked for only main array.

Is any wrong in my job?.

Please see the attached screenshot also collectionData format.

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hi @sudarshan khandu​,

 

Sorry about the delay in responding here, I have been a little tied up with some other bits and pieces. However, I have quickly knocked up an example of what you could.

 

So this is a screenshot of the job. I chose to use a tFileInputRaw to load the file in....you are using MongDB, but it shouldn't matter. You will also see 3 tExtractJsonFields components, 2 tHashOutputs and a tHashInput in the second subjob. I will explain those. Finally you will see a tAggregateRow and a tSortRow. The tAggregateRow will be used to condense the data returned at the end (to just the data you want) and the tSortRow is used to simply order it so that you can see it working easily.

 

I chose a minimal selection of elements to return, you can add to these. I simply chose an element at each level in the nested arrays.

 

0695b00000bHJVFAA4.png 

The first tExtractJsonFields component can be seen below. Notice the "Mapping" section. This is configured to have columns for all of the data....even though I am not collecting it all here. In this component I am simply collecting "CommunityId" and the outer array "questionResponse".

0695b00000bHJLfAAO.png 

The second tExtractJSONFields component has the same mapping fields. But you will notice that "CommunityId" is not populated. This is because the value will simply be passed along from the previous component. That is how empty columns work in the tExtractJSONFields component. If there is a value supplied by the previous component and the field is left empty, it will simply pass the existing value along.

 

You will notice I select the "title" and the "dependentQuestionResponse" array here.

0695b00000bHJWSAA4.png 

The next component is a tHashOutput. This is used to collect and store the output from the previous component and then pass it on to the next. The schema of this is exactly the same as the tExtractJSONFields. Notice that append is ticked. This will be linked to the next tHashOutput component. Or rather, it will be linked to this.

0695b00000bHJX1AAO.pngNow we have the final tExtractJSONFields component. Notice the same schema and the empty "pass-through" columns in the mapping. This is retrieving the data from your innermost array. I am collecting the "question" field from the innermost array here.

0695b00000bHJXfAAO.pngThis is the second tHashOutput. Notice the "Link with a tHashOutput" and the "Component list" params here. These link this to the same memory as the first tHashOutput.

 

0695b00000bHJYEAA4.png 

At this point all of our data is collected....but rows will be duplicated. Why? Because we have the tHashOutput between what should be a single step in the row processing. So it gets the data from the first two tExtractJSONFields components and then the data from the last one. This will be duplicating some of it with bits and pieces aded. So what we do in this SubJob is return all of that data, then aggregate it. Notice the tHashInput is linked to the first tHashOutput used.

0695b00000bHJYxAAO.pngThe tAggregateRow component is used to aggregate the data and get rid of the duplicates/extraneous data. I am grouping by "CommunityId" and "title" here. You may have better fields to group by. The one field that may be there or be missing is the "question" field. So to get rid of rows that have "null" AND a value for "question" linked to the same "communityId" and "title", we set this field to the function of Max.

0695b00000bHJZCAA4.png 

The tSortRow didn't need to be showed. It was simply used to sort the data so that I could show the result below....

 

4ca44afgfgfgfae3b1a382bfg38cb|Comments|

4ca44afgfgfgfae3b1a382bfg38cb|Navigation|

4ca44afgfgfgfae3b1a382bfg38cb|Preference|

4ca44afgfgfgfae3b1a382bfg38cb|surveys|

7e7ba747d9ad46812eadd87af6af6a0e|Location|locationtype

7e7ba747d9ad46812eadd87af6af6a0e|Question1|

 

 

View solution in original post

5 Replies
Anonymous
Not applicable

If I understand this issue correctly, this is down to how you are using the tExtractJSONFields component. I believe (I will need to see an example of the whole JSON to give an example) that you will need two tExtractJSONFields for this. The first one to extract the data outside of your outer array and to iterate over the outer array and then another to iterate over the inner array.

 

If you can post an example of the JSON (without private data), I can work out what it is you will need to do.

sudarshank
Contributor III
Author

Hi @Richard Hall​,

Thank you for reply!

 

I have attached the sample collection data. I want to data like in below screenshot format in tLogRow using tmongodbinput.

 

0695b00000bHGD3AAO.png 

 

Anonymous
Not applicable

Hi @sudarshan khandu​,

 

Sorry about the delay in responding here, I have been a little tied up with some other bits and pieces. However, I have quickly knocked up an example of what you could.

 

So this is a screenshot of the job. I chose to use a tFileInputRaw to load the file in....you are using MongDB, but it shouldn't matter. You will also see 3 tExtractJsonFields components, 2 tHashOutputs and a tHashInput in the second subjob. I will explain those. Finally you will see a tAggregateRow and a tSortRow. The tAggregateRow will be used to condense the data returned at the end (to just the data you want) and the tSortRow is used to simply order it so that you can see it working easily.

 

I chose a minimal selection of elements to return, you can add to these. I simply chose an element at each level in the nested arrays.

 

0695b00000bHJVFAA4.png 

The first tExtractJsonFields component can be seen below. Notice the "Mapping" section. This is configured to have columns for all of the data....even though I am not collecting it all here. In this component I am simply collecting "CommunityId" and the outer array "questionResponse".

0695b00000bHJLfAAO.png 

The second tExtractJSONFields component has the same mapping fields. But you will notice that "CommunityId" is not populated. This is because the value will simply be passed along from the previous component. That is how empty columns work in the tExtractJSONFields component. If there is a value supplied by the previous component and the field is left empty, it will simply pass the existing value along.

 

You will notice I select the "title" and the "dependentQuestionResponse" array here.

0695b00000bHJWSAA4.png 

The next component is a tHashOutput. This is used to collect and store the output from the previous component and then pass it on to the next. The schema of this is exactly the same as the tExtractJSONFields. Notice that append is ticked. This will be linked to the next tHashOutput component. Or rather, it will be linked to this.

0695b00000bHJX1AAO.pngNow we have the final tExtractJSONFields component. Notice the same schema and the empty "pass-through" columns in the mapping. This is retrieving the data from your innermost array. I am collecting the "question" field from the innermost array here.

0695b00000bHJXfAAO.pngThis is the second tHashOutput. Notice the "Link with a tHashOutput" and the "Component list" params here. These link this to the same memory as the first tHashOutput.

 

0695b00000bHJYEAA4.png 

At this point all of our data is collected....but rows will be duplicated. Why? Because we have the tHashOutput between what should be a single step in the row processing. So it gets the data from the first two tExtractJSONFields components and then the data from the last one. This will be duplicating some of it with bits and pieces aded. So what we do in this SubJob is return all of that data, then aggregate it. Notice the tHashInput is linked to the first tHashOutput used.

0695b00000bHJYxAAO.pngThe tAggregateRow component is used to aggregate the data and get rid of the duplicates/extraneous data. I am grouping by "CommunityId" and "title" here. You may have better fields to group by. The one field that may be there or be missing is the "question" field. So to get rid of rows that have "null" AND a value for "question" linked to the same "communityId" and "title", we set this field to the function of Max.

0695b00000bHJZCAA4.png 

The tSortRow didn't need to be showed. It was simply used to sort the data so that I could show the result below....

 

4ca44afgfgfgfae3b1a382bfg38cb|Comments|

4ca44afgfgfgfae3b1a382bfg38cb|Navigation|

4ca44afgfgfgfae3b1a382bfg38cb|Preference|

4ca44afgfgfgfae3b1a382bfg38cb|surveys|

7e7ba747d9ad46812eadd87af6af6a0e|Location|locationtype

7e7ba747d9ad46812eadd87af6af6a0e|Question1|

 

 

sudarshank
Contributor III
Author

Thank you! @Richard Hall​ 

Anonymous
Not applicable

Not a problem. Hope it solved your issue 🙂