performance improvement

RakeshKumar1 · ‎2021-10-06

Reading 21 .dat files which holds 22million records and writing into two target tables and reject files after schema check. This is a migration project and we are trying to match the job run time of datastage which is 1.3mins whereas in talend it is taking 2.6mins.

Job design

1) tFileList (reading 21 files) --> tFileInputDelimited --> tMap1

2) tDBInput1--> tMap1

3) tMap1 --> splits into 2 flow --> flow 1 --> tSchemaComplaianceCheck1 --> tDBOupt1 and tFileOutputDelimited1

flow 2 --> tSchemaComplaianceCheck2 --> tDBOupt2 and tFileOutputDelimited2

To achieve 2.6mins below is the configuration,

1) In Iterate link - enabled parallel execution to 4

2) Fetch Size in tDBInput1 - 10000

3) tDBOupt1 and tDBOupt2

BATCH_SIZE - 100000

COMMIT EVERY - 50000

Parllel Execution - 12

Can anyone please suggest, performance improvement steps?

RakeshKumar1 · ‎2021-10-08

Please help on this question? how can I get the performance.

gjeremy1617088143 · ‎2021-10-08

Hi @Rakesh Kumar ,

you can try to allocate more memory to the jvm : run tab --> advanced settings --> use pecific JVM arguments

-Xms number M memory allocated a the launch of the job

-Xmx number M max memory allocated.

in tDBOutput do you use Insert or Update ?

Also if the tSchemaComplianceCheck are the same make the split after it

Send me Love and kudos

Anonymous · ‎2021-10-08

The tDBInput1 looks like a lookup table. If the datasets are always the same you can write the content into a tHashOutput before and reuse it with the tHashInput for the actual lookup to tMap_1

Anonymous · ‎2021-10-08

Parallelisation in tDBOutput is often not a performance win, instead it could potentially kill performance because of deadlocks

Other

Talend Data Integration

v7.x