Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
RakeshKumar1
Contributor
Contributor

performance improvement

Reading 21 .dat files which holds 22million records and writing into two target tables and reject files after schema check. This is a migration project and we are trying to match the job run time of datastage which is 1.3mins whereas in talend it is taking 2.6mins.

 

Job design

 

1) tFileList (reading 21 files) --> tFileInputDelimited --> tMap1

2) tDBInput1--> tMap1

3) tMap1 --> splits into 2 flow --> flow 1 --> tSchemaComplaianceCheck1 --> tDBOupt1 and tFileOutputDelimited1

                   flow 2 --> tSchemaComplaianceCheck2 --> tDBOupt2 and tFileOutputDelimited2

                                                                                                                                 

To achieve 2.6mins below is the configuration,

1) In Iterate link - enabled parallel execution to 4

2) Fetch Size in tDBInput1 - 10000

3) tDBOupt1 and tDBOupt2

                            BATCH_SIZE - 100000

                            COMMIT EVERY - 50000

                            Parllel Execution - 12                   

 

Can anyone please suggest, performance improvement steps?

Labels (3)
4 Replies
RakeshKumar1
Contributor
Contributor
Author

Please help on this question? how can I get the performance.

gjeremy1617088143

Hi @Rakesh Kumar​ ,

you can try to allocate more memory to the jvm : run tab --> advanced settings --> use pecific JVM arguments

-Xms number M memory allocated a the launch of the job

-Xmx number M max memory allocated.

 

in tDBOutput do you use Insert or Update ?

 

Also if the tSchemaComplianceCheck are the same make the split after it

Send me Love and kudos

 

 

Anonymous
Not applicable

The tDBInput1 looks like a lookup table. If the datasets are always the same you can write the content into a tHashOutput before and reuse it with the tHashInput for the actual lookup to tMap_1

Anonymous
Not applicable

Parallelisation in tDBOutput is often not a performance win, instead it could potentially kill performance because of deadlocks