Oracle Dynamic flow to handle 400 tables

Anonymous · ‎2016-10-25

Hi,
I am struggling to design a job and looking your expertize and how you have done similar kind of work in your project.
Source: Oracle DB (405 tables)
Target: Hive
Need a dynamic flow to handle 400 tables with minimum jobs(dynamic).
- Transformations involved:
1. - Around 300 are full load
2. - 100 tables are incremental load
- 12 tables need data masking functionality
I realy do not want to create here 400 + Jobs to takle this situation.

Any Ideas ?
Thanks,
SK

Anonymous · ‎2016-10-26

I am also in the same boat but I need to load flat files in to the Hive.
For Oracle to Hive, you can make use the Oracle Information tables and Hive Informational tables to creates the sqoop command dynamically. This should help you.

Anonymous · ‎2016-10-26

I was thinking to do that in this way

Flow :
Oracle Table > Local File > HDFS > Hive

1.       Oracle DB > Local File
Table load with no transformation required: The feature of Dynamic Schema can be used.
Table where Data Masking Tables is required: Needs a one to one flow. No dynamic way possible as particular columns needs to be specified.

2.       Local File > HDFS
The path of the local file can be passed dynamically by storing the path of the file in context variable.

3.       HDFS > Hive
a.        A Master table containing the column details or SQL query/DDL query itself need to be created.
b.       Talend will read the table
c.        Hive table will be created for each table mentioned in the record, with help of the DDL query.

Hence from there one row at a time will be iterated in Talend containing table name & the SQL query
tHiveCreateTable component will read the table & the queries one by one & execute.

Challenges of this strategy:
Talend based dynamic feature works only until creation of file from the DB. For the Hive Table creation a rather static table needs to be referred which cannot accommodate changes on the fly or dynamic feature as provided by Talend.

Do you have any better Idea to handle our situation ?
BR,
SK

Anonymous · ‎2016-10-27

I guess you can eliminate couple of these steps by using sqoop. Talend Big data does support sqoop (not sure which flavor of talend are you using.)
I have not used dynamic schema so I am not sure about it. Can you please share a sample job for the same?
For the load of data from File to Hive, I guess your approach is correct. What you can do is do parallelism for multiple tables to load in parallel. Since the structure of the tables is similar to what is present in files, you need not do any validation either. I guess your approach is pretty much on target here

Big Data

v6.x