Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Compare Two Datasets (Txt File and MongoDB Documents) for Changes, Upload Changes.

I have a situation in which I receive a data file each month from client. Usually the file is a txt file or csv which is then processed by my Talend job and output to 3 MongoDB collections.

 

For context, let's say I receive a file on the 1st of the month with 2000 records and then on the 15th I receive another file but this one not only has 3000 records but some of the previous records have been altered or removed from the file.

 

I would like to be able to compare the current datasets (3 mongodb collections) to the new dataset (txt file) and update the Mongo collections with any value/status changes accordingly.

 

Any insight on how to do this would be greatly appreciated.

Labels (2)
3 Replies
Anonymous
Not applicable
Author

Hello,

If you want to capture the changed data and only load these changed data into target table to achieve table sync, you can compare tables by using tMap.
The work flow should be: new dataset-->tMap(make inner join on your input and set the "Catch lookup inner join reject" as true)-->Mongo collections output
                                        current datasets -->
The output will be the changed data.

Best regards

Sabrina

Anonymous
Not applicable
Author

Using tMap will catch new and deleted records, but it won't catch whether an existing record has changed. I haven't used MongoDB, but typically databases have a way to compare two records and report any differences. If you have to do it yourself, the standard solution is to compute a hash of both records and compare those; I don't know if Talend has a component to do this (it won't be tHashInput, which has a different function), but you could certainly do it with tJava.

Anonymous
Not applicable
Author

There is a post just below yours that addresses this topic (turns out Talend does have a tHash component for this purpose):

https://community.talend.com/t5/Design-and-Development/Need-to-compare-file-data-with-db/m-p/143305#...