Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I need to create a job which will ingest list of tables by sqooping data from source (RDBMS) to hadoop then to hive.
I put list of tables in a file, read and iterate it to ingested.
Because I have about 300+ tables to ingest, so it will take time if it ingested just by a process. So I need to parallelize it.
What I currently think is, the job will read the list of tables then split it to 10 tables per array. Each arrays then passed to the subjob to processed.
I already do this logic by implement it in spark scala code. The problem is, we need to move it to be a talend job so it will be more easy to monitor and maintenance by operation team since the only familiar with talend while I don't know how to implement this logic in talend.
I will appreciate any help. Thanks.,