
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compare Two Tables field by field in different data bases(Netezza & Hive)
Hi ,
We are migrating data from Netezza data base to Hive Data base. We need to Compare all the tables in Netezza after migration with Hive Database Tables as we should not miss any data. By Using Talend Open Studio For Big Data, can we able to create a job to compare Two Tables within different databases. Can you please let me know the creation of job with steps.Thanks in advance.
Regards,
ntrayudu.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A quick and easy way to spot differences between tables is to create a hash of all of the columns in each row in your tables and compare the hashes. If you have common unique keys to your data, output each row as essentially 2 columns; the key field and a concatenated hash of the rest. This will allow you to quickly find rows with differences and with them you can go into more detail to identify which columns are different.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note: We have data around 300 crores rows of data in some of the tables.
For these type of huge size tables what is the best way to do these type of validation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Talend is a tool that essentially produces Java for you. Therefore it is easy to introduce your own (or other people's) Java classes and methods. Take a look here for Hashing (http://www.codejava.net/coding/how-to-calculate-md5-and-sha-hash-values-in-java). There are other sources online for this.
In order to convert the rows, simply read them in as normal, concatenate the columns in a tMap (or tJavaFlex, tJavaRow, etc) and use your Hashing technique on them there. After that it is simply a case of comparing Hash Strings.
