Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I/p:Two text files having 5 columns each
Schema:Both files having same schema
O/p: Need to compare all 5 columns for each row and assign a flag if any of the column is not matching.
Approach taken:
Using t map on a key and trying to check each column using .equals() method and assigning values to a flag based on outcome of comparison.
Issue:
Not able to write the statement to implement the approach.
Ask:
Need pointers to write the statement or suggestion if i should take another approach.
🙂
for what?!!!!!
seriously - without understanding full picture, not possible to make a proper solution!
is your program (application, service and etc) - work with plain csv files?
example1 - pure CDC:
- MySQL, PostgreSQL - proper CDC technologies already available on market for free
- SQL Server, Oracle and etc - in enterprise versions, but could be easily realized by triggers
catch transactions (filtered by table if need), send to Kafka, parse by many subscribers
example 2:
load both files in any database ( including local SQLite) and all CDC between 2 files it is just a 3 SQL queries
example 3:
use JDBC for csv - again 3 simple queries
example 4:
pure Talend - 3 flows:
- find id from 1st files which is not 2nd - for INSERT - 1st file in main, 2nd is lookup, INNER JOIN catch rejected
- find id in 2nd but not in 1st - for DELETE- 2st file in main, 1nd is lookup, INNER JOIN catch rejected
- INNER JOIN in tMAP by id and filter where
(row1.col1 != row2.col1 || row1.col2 != row2.col2 || row1.col3 != row2.col3 || row1.col4 != row2.col4 || row1.col5 != row2.col5)
instead of != possible to use .equals() for string and etc
Hi,
If my understanding about your use case is correct, you are trying to match the records from two files having same schema and you want to pick the matched and non matched records between main flow and lookup flow.
If my assumption is right, then the most easy way to do the matching is to add an inner join between the columns of main and lookup flow. If there are matched records (where matching is done on all 5 columns), then you can send the data to output flow and you can add the flag value as "Y". You can separate all non-matched records of inner join by creating another output flow and marking the inner join reject as true for that flow. Here you can add the flag value as "N"
You can store the output to separate file or tHashOutput for later process. If you want to merge the output in separate Sub job by reading the output data from both output flows and merge them using tUnite component.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi Nikhil,
think it needs little more complicated logic, because of inner join lookup (catch rejected):
so, first need to clarify the goal and environment and after think -how to better approach the goal
Thanks Vapukov,
Intention is to implement CDC.I have open studio version which doesn't have CDC components.
Moreover CDC need to be implemented on files ,No DB involved.
Trying to used to identify unmatched records from both files in a single job.
I could implement logic to get the unmatched from one file using option provided by Nikhil, but not from both i/p files.
Also by creating two sub jobs i could get unmatched from both file, but to check if there is any update on columns except key field what should i do?
I am thinking to write a variable in tmap and compare each column by using some function(I don't know which function i should use here to compare).
let talk about "CDC"
🙂
for what?!!!!!
seriously - without understanding full picture, not possible to make a proper solution!
is your program (application, service and etc) - work with plain csv files?
example1 - pure CDC:
- MySQL, PostgreSQL - proper CDC technologies already available on market for free
- SQL Server, Oracle and etc - in enterprise versions, but could be easily realized by triggers
catch transactions (filtered by table if need), send to Kafka, parse by many subscribers
example 2:
load both files in any database ( including local SQLite) and all CDC between 2 files it is just a 3 SQL queries
example 3:
use JDBC for csv - again 3 simple queries
example 4:
pure Talend - 3 flows:
- find id from 1st files which is not 2nd - for INSERT - 1st file in main, 2nd is lookup, INNER JOIN catch rejected
- find id in 2nd but not in 1st - for DELETE- 2st file in main, 1nd is lookup, INNER JOIN catch rejected
- INNER JOIN in tMAP by id and filter where
(row1.col1 != row2.col1 || row1.col2 != row2.col2 || row1.col3 != row2.col3 || row1.col4 != row2.col4 || row1.col5 != row2.col5)
instead of != possible to use .equals() for string and etc