Job Performance Optimization

abhi90 · ‎2018-05-16

Hi All,

I have designed the Job in the format attached.I am doing Data Conversion of All the Columns from String to Respective target One's and doing ERROR CHECK. I have checked USECURSOR in toracleInPut Component and in Both toracleoutput I have used Batch as 500000 which is much more than my expected Data to be loaded. What I am seeing is My Data getting fetched in 5.32 Seconds but It is taking Time then to commit or ending the Job. I am using both OracleOutput Component using "USE EXISTING CONNECTION " using my Connection established in my Job. WithOut using toracleCommit getting Throughput 1108Rows/sec and after using tOracleCommit getting as 1233Rows/Sec Now. Any way to make it more faster?

@rhall, @manodwhb, @vboppudi, @cterenzi, @vapukov, @shong, @xdshi

manodwhb · ‎2018-05-16

@abhi90,can you check with tOracleOutputBulkExec,if there is a Target bottelneck?

abhi90 · ‎2018-05-16

Hi @manodwhb,

toraclebulkexec is for loading from a Data File. In my case tmap is must as I have to do error validation. on some columns incoming Data Flow Value. I Have attached snapshots How I am doing that.

ERROR_CHECK_CRITEO.JPG
Validation_CRITEO.JPG

Anonymous · ‎2018-05-16

Is there any reason you are using a shared connection? Try setting your output component to use its own connection and handle its own committing. Also, your batch size seems a little high. Scale that back to something between 1000 and 10000.

You should remember that the databases is doing significantly more when inserting data. I think you should be able to get much faster than 1000 rows a second, but you will never get the same speed as you do with the read.

abhi90 · ‎2018-05-16

Hi @rhall,

Are you referring to use tOracleOutput to use Connection type as Repository?If I have 32770 rows Then what should be my Value in the Settings of toracleOutput "COMMIT_EVERY"? If I am not using Lookups Then what should be my Buffer Size in tMap?I have currently kept it as 2400000. Should This improve my Job Performance?

abhi90 · ‎2018-05-16

@TRF

Anonymous · ‎2018-05-16

I do not "know" these things, I experiment with them. They change per query, per environment, etc. Otherwise Talend would just hard code what they believe to be the optimized values.

You should untick "Use an existing connection"in your tOracleOutput component and configure the connection there.

abhi90 · ‎2018-05-17

Hi @shong, @xdshi,

Please let me know if U guys can help me in optimizing the performance of the Job. I will send my job in that case.

Anonymous · ‎2018-05-17

Did you not get an improvement trying what I suggested?

Anonymous · ‎2018-05-18

Hello,

Could you please upload your whole work flow screenshot on forum?

The database connection could affect the job performance. The job always runs better if the database is installed on local, if the database is on another machine, even you are on VPN, you may have the congestion and latency issues.

Best regards

Sabrina

Talend Data Integration

v7.x