Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Reading 21 .dat files which holds 22million records and writing into two target tables and reject files after schema check. This is a migration project and we are trying to match the job run time of datastage which is 1.3mins whereas in talend it is taking 2.6mins.
Job design
1) tFileList (reading 21 files) --> tFileInputDelimited --> tMap1
2) tDBInput1--> tMap1
3) tMap1 --> splits into 2 flow --> flow 1 --> tSchemaComplaianceCheck1 --> tDBOupt1 and tFileOutputDelimited1
flow 2 --> tSchemaComplaianceCheck2 --> tDBOupt2 and tFileOutputDelimited2
To achieve 2.6mins below is the configuration,
1) In Iterate link - enabled parallel execution to 4
2) Fetch Size in tDBInput1 - 10000
3) tDBOupt1 and tDBOupt2
BATCH_SIZE - 100000
COMMIT EVERY - 50000
Parllel Execution - 12
Can anyone please suggest, performance improvement steps?
Please help on this question? how can I get the performance.
Hi @Rakesh Kumar ,
you can try to allocate more memory to the jvm : run tab --> advanced settings --> use pecific JVM arguments
-Xms number M memory allocated a the launch of the job
-Xmx number M max memory allocated.
in tDBOutput do you use Insert or Update ?
Also if the tSchemaComplianceCheck are the same make the split after it
Send me Love and kudos
The tDBInput1 looks like a lookup table. If the datasets are always the same you can write the content into a tHashOutput before and reuse it with the tHashInput for the actual lookup to tMap_1
Parallelisation in tDBOutput is often not a performance win, instead it could potentially kill performance because of deadlocks