
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ETL is slow between Oracle to Vertica
Hi Everyone,
I have an Oracle to Vertica job built in the Talend open source version. I just have a tOracleInput --> tVerticaOutput. On the tOracleInput, I have the "use Cursor" checked with cursor size at 500,000. On the tVerticaOutput, I have the "use batchmode" selected and set to commit every 500,000 rows. Loading two million rows from Oracle to Vertica in 3 to 4 minutes. I think that is really long. If I dump the data from oracle into a csv file and then upload that file to vertica, it takes seconds. Does anyone have experience with this? Am I missing some option I'm not aware of? Or is there a difference between Talend Open Source vs Talend Open Studios that would make it faster? Thanks in advance.
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.
Now, regarding the tVerticaOutput component, you have to change also the batchsize.
But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.
Hope this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.
Now, regarding the tVerticaOutput component, you have to change also the batchsize.
But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.
Hope this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Sorry for the delayed response. So I tried your solution and did notice an uptick in speed by a couple of seconds. So what I tested was to output oracle data to a delimited file and then use tVerticaOutputBulkexec to upload the file. This method was faster for large tables but for smaller tables, it didn't seem to matter.
That said, i was trying to keep everything to be database to database instead of having to generate a file and then load that.
In our vertica server, we installed a sqlplus add on that can pull data from oracle to vertica much faster than what talend can do. I was just curious if i was missing anything when i setup the job. Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I agree on this solution. It works well. Normally my environment row processing around ~100K rows/sec.
Oracle --> tMap --> Vertica(outputBulkExecute)
cursor=500K staging with outfile, rejfile, and excfile.
Currently I'm looking for it possible to do partially commit using outputBulkExectue of Vertica.
Please advise.
