topic Re: ProcessingSpeed Up Processing From tOracleInput to tHDFSOutput in Talend Studio

ProcessingSpeed Up Processing From tOracleInput to tHDFSOutput

Anonymous — Sat, 16 Nov 2024 10:13:15 GMT

I am running a job that is pulling data from an Oracle DB from a remote server, and I am trying to push that table into HDFS. The best I am getting is 4100 rows per second, and there is total of 53 million rows. I have six tables like that.
I have set JVM setting
-xms 16GB | -xmx 32GB
What Can I do to increase the performance, at this rate, I will have all the data loaded in over 12 hours?
Thanks,

Re: ProcessingSpeed Up Processing From tOracleInput to tHDFSOutput

Anonymous — Fri, 09 Dec 2016 10:21:20 GMT

Hi,
Usually, we are using the tsqoopimport to load the data to HDFS from a relational database management system (RDBMS), MySql, Oracle..
Please take a look at component reference about:TalendHelpCenter:tSqoopImport.
Best regards
[font=noto, Helvetica, Arial, sans-serif]Sabrina[/font]

Re: ProcessingSpeed Up Processing From tOracleInput to tHDFSOutput

Anonymous — Mon, 12 Dec 2016 05:18:09 GMT

What if in case I want to load the data from Oracle in memory first and then do processing on it?
If I use sqoop then I will have to get data to HDFS first then read from there. There will 2 I/O operations involved here. In case I use tOracleInput, then data will come in memory. I will do direct processing on it and then load it in HDFS.

Which according to you is the better approach?