Solved: Extract Difference Data from two files - Qlik Community

vidya821 · ‎2017-09-15

Hi All,

Is it possible to extract the difference data from tFileCompare ?

One approach to compare is LookUp (but i have different set of files to compare and each set has different headers, so for lookup i need to modify the file schema everytime), is there any component to compare two files and get the difference data as output directly ?

Thanks

Anonymous · ‎2017-09-20

You can concatenate and hash in your Oracle query (which would help make it more dynamic). This way your job does not have to be aware of the schema, you just need to pass it an appropriate SQL query with the concatenation and hashing built in.

View solution in original post

Anonymous · ‎2017-09-15

Are you looking for a comparison character by character? So, for example, in the following examples, the differences would be as shown below....

aaaaaaaaaaaaaaaabbbbbbbbbbbbbbcccccccccccccccdddddddddddd
eeeeeeeeeeeeeeffffffffffffffffffffggggggggggggggggghhhhhhhhhhhhiiiiiiii

1aaaaaaaaaaaaaaaabbb2bbbbbbbbbbbcccccccccccccccdddddddddddd
eeeeeeeeeeeeeeffffffffffffffffffffggggggggg3gggggggghhhhhhhhhhhhiiiiiiii

If that is what you want, there is nothing "out of the box" and it might be quite tricky to build this using standard components. You could try the Talend Exchange or look for a Java API to handle this and call it from Talend.

vidya821 · ‎2017-09-15

No, i want to do row by row comparison with columns values check
E.g
File 1
A|B|C
1|1|1

File 2
A|B|C
1|1|1
2|2|2

So if i compare these files, the oputput file should be
A|B|C
2|2|2

Anonymous · ‎2017-09-17

Hello,

Where are your input files from? Tables? Are you looking for redundancy analysis in talend data quality prodcut?

Best regards

Sabrina

vidya821 · ‎2017-09-18

Yes, the inputs are from tables with same scehma but from two different databases (Oracle 9 and Oracle 12 respectively).
however i cannot connect both the databases in talend due to some concerns, but i can extract data files individually.
i need these data files to be compared so as to validate if all the data for few selected tables has migrated correctly from oracle 9 to Oracle 12

vidya821 · ‎2017-09-19

@xdshi,
Can you please tell me if below design is possible in Talend.
1) i have two database connections available in application (METADATA)
2) Both the databases have same tables with similar schémas (There are 30 or so tables )
3) I want to compare the data from the two corresponding tables
4) Can i design a job where i can send the name of the tables one after another and the job does data comparison and creates an output file with difference data rows
5) In this way i can avoid the need to create a new job for new table comaprison (for 30 tables i hve to create 30 jobs)

Anonymous · ‎2017-09-19

If the tables that are supplying the data have the same schema, you don't need to worry about the headers at all. Just join your two files using a tMap and ensure that every column that should be the same is joined. Then have two outputs; 1 for the matches and one for rows from the main that do not match

vidya821 · ‎2017-09-19

Thanks Rhall,
but now i have two tables directly and not file data, i know that we can join two oracle tables also and perform the operation but my job has to handle 30 set of tables.
is it possible to design one job that can be reused by every table set ?
The DB's are like these below
DB1 DB2
Table1-Schema1 Table1-Schema1
Table2-Schema2 Table2-Schema2
Table3-Schema3 Table3-Schema3
So i have to campare Table1 from DB1 and DB2, create an output file with uncommon data
after table1 the process should continue for Table2 and respectively

Anonymous · ‎2017-09-19

Yes, you can create one job for all tables.....but it will only tell you about rows that are exactly the same....and it will be complicated to build if you are new to this.

1) Input your data from your tables with ALL of the columns concatenated and hashed. Output this as a String (Varchar). You will need a primary key on the table to be output as well. So your data from each table will be ....

Key
ConcatenatedHash

2) In your tMap join on your ConcatenatedHash column. Remember that the Main flow will be the only flow where ALL rows are guaranteed to be tested. If you require both sides to be tested you will have to reverse the lookup in another tMap.

3) When you identify matches, you can link back to your unconcatenated data using the Key.

vidya821 · ‎2017-09-19

Yeah thats a bit of complex process, as per the requirement i was hpoing to implement the below oracle statement in Talend

select * from DB1-table1
MINUS
select * from DB2-table1

its okay if this is not possible via talend
Thanks

Extract Difference Data from two files

Talend Data Integration

v6.x