Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

ETL is slow between Oracle to Vertica

Hi Everyone,

     I have an Oracle to Vertica job built in the Talend open source version.  I just have a tOracleInput --> tVerticaOutput. On the tOracleInput, I have the "use Cursor" checked with cursor size at 500,000. On the tVerticaOutput, I have the "use batchmode" selected and set to commit every 500,000 rows. Loading two million rows from Oracle to Vertica in 3 to 4 minutes. I think that is really long. If I dump the data from oracle into a csv file and then upload that file to vertica, it takes seconds. Does anyone have experience with this? Am I missing some option I'm not aware of?  Or is there a difference between Talend Open Source vs Talend Open Studios that would make it faster?  Thanks in advance. 

Labels (1)
1 Solution

Accepted Solutions
TRF
Champion II

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.

View solution in original post

3 Replies
TRF
Champion II

Hi,

 

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

 

Hope this helps.

Anonymous
Not applicable
Author

Hi, 

    Sorry for the delayed response. So I tried your solution and did notice an uptick in speed by a couple of seconds. So what I tested was to output oracle data to a delimited file and then use tVerticaOutputBulkexec to upload the file. This method was faster for large tables but for smaller tables, it didn't seem to matter. 

 

That said, i was trying to keep everything to be database to database instead of having to generate a file and then load that. 

 

In our vertica server, we installed a sqlplus add on that can pull data from oracle to vertica much faster than what talend can do. I was just curious if i was missing anything when i setup the job. Thanks 

DriscollBunt
Contributor

Hi All,

I agree on this solution. It works well. Normally my environment row processing around ~100K rows/sec.

Oracle --> tMap --> Vertica(outputBulkExecute)

cursor=500K             staging with outfile, rejfile, and excfile.

 

Currently I'm looking for it possible to do partially commit using outputBulkExectue of Vertica.

Please advise.