Solved: ETL is slow between Oracle to Vertica - Qlik Community

Anonymous · ‎2017-05-19

Hi Everyone,

I have an Oracle to Vertica job built in the Talend open source version. I just have a tOracleInput --> tVerticaOutput. On the tOracleInput, I have the "use Cursor" checked with cursor size at 500,000. On the tVerticaOutput, I have the "use batchmode" selected and set to commit every 500,000 rows. Loading two million rows from Oracle to Vertica in 3 to 4 minutes. I think that is really long. If I dump the data from oracle into a csv file and then upload that file to vertica, it takes seconds. Does anyone have experience with this? Am I missing some option I'm not aware of? Or is there a difference between Talend Open Source vs Talend Open Studios that would make it faster? Thanks in advance.

TRF · ‎2017-05-19

Hi,

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

Hope this helps.

View solution in original post

TRF · ‎2017-05-19

Hi,

Well, to confirm which part is slow (Oracle or Vertica), you can replace the tVerticaOutput by a simple tFileOutputDelimited.

Now, regarding the tVerticaOutput component, you have to change also the batchsize.

But you may also change to use tVerticaOutputBulkExec which is (as all t<DB name>OutputBulkExec components) dedicated to mass transfer.

Hope this helps.

Anonymous · ‎2017-05-24

Hi,

Sorry for the delayed response. So I tried your solution and did notice an uptick in speed by a couple of seconds. So what I tested was to output oracle data to a delimited file and then use tVerticaOutputBulkexec to upload the file. This method was faster for large tables but for smaller tables, it didn't seem to matter.

That said, i was trying to keep everything to be database to database instead of having to generate a file and then load that.

In our vertica server, we installed a sqlplus add on that can pull data from oracle to vertica much faster than what talend can do. I was just curious if i was missing anything when i setup the job. Thanks

DriscollBunt · ‎2018-02-25

Hi All,

I agree on this solution. It works well. Normally my environment row processing around ~100K rows/sec.

Oracle --> tMap --> Vertica(outputBulkExecute)

cursor=500K staging with outfile, rejfile, and excfile.

Currently I'm looking for it possible to do partially commit using outputBulkExectue of Vertica.

Please advise.

ETL is slow between Oracle to Vertica

Talend Data Integration