Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
dipanjan93
Contributor
Contributor

Extract Data from Oracle and Load Hive Tables using Batch Jobs

Hello Community,

 

I have a requirement where I need to extract data from the Oracle Database to csv files using Standard Job and post extraction these csv files would be loaded to Hive tables using Batch Jobs. Currently, all the csv files are under a FTP server location.

 

I'm quite new to Batch job concept. Could you please help me over here?

Labels (4)
3 Replies
Anonymous
Not applicable

Hi,

 

     The batch loading concept is almost simliar to DI batch load flow. But here, we are using Spark engine to drive the flow and the components will change according to Spark layout.

 

      Since you are using Hive to output the data, I would suggest to go through the Bigdata job properties for this component and also go through the sample job specified in the link below.

 

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/xd_IX0AdYKc3dTF9akwRVw

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

dipanjan93
Contributor
Contributor
Author

Hi @nthampi,

 

Many Thanks for the prompt response!

 

I went through the url you've given below. However, my requirement is slightly different. As I'm currently using Talend Big Data Enterprise v6.3.1, I do not see any FTP component while creating Big Data Batch job (not sure if its available in v7.1.1) due to which I'm unable to proceed further as firstly, I have to extract the csv files from FTP server. 

 

Is there a way to handle this situation?

 

Thanks in advance!

 

Best Regards,

Dipanjan

 

Anonymous
Not applicable

@dipanjan93

 

You should not use a Big data job for FTP processing. There should be a DI job to perform the FTP activity and once this job is complete, you can use a BD job to perform the further tasks within the BD job.

 

In the execution plan, you can orchestrate the DI and BD jobs in such a way that the DI and BD jobs will start one after another.

 

Warm Regards,
Nikhil Thampi

Please appreciate our members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂