Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
benu
Contributor III
Contributor III

Iterating over a directory and parsing XML files runs progressively slower

Hey folks...I've got a pretty simple job that reads XML files from a directory and passes them to a tFileInputXML component, parsing out half-dozen fields and appending to an Excel file. As it runs, it gets progressively slower. It starts out at maybe 10 files/second, and now that I'm 2600 files in, it's running about 1 file every 3 seconds. I still have thrice that to process, so this job is going to take hours.

Anyone seen this behavior? Recommendations? I'll post a screenshot of my job setup; happy to provide any parameters for components if it'll help us figure out how to make it faster or more linear.

Thanks in-advance! -Ben

0695b00000H8O59AAF.png

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hello

Try the following changes to see if it could improve the performance.

1 Locate more memory to the job execution, Open Run viewer, click Advanced settings panel and check the 'Use specific JVM parameters' box and modify the JVM parameters.

2 Remove tLogRow component from the job, this component is just for debugging purpose.

3 Click Iterate connector and enable parallel execution

4 If the amount of data in the file is not very large, cache the data in memory using tUnite before tFileOutputExcel to avoid too much interaction with IO file.

 

Regards

Shong

View solution in original post

4 Replies
Anonymous
Not applicable

Hello

Try the following changes to see if it could improve the performance.

1 Locate more memory to the job execution, Open Run viewer, click Advanced settings panel and check the 'Use specific JVM parameters' box and modify the JVM parameters.

2 Remove tLogRow component from the job, this component is just for debugging purpose.

3 Click Iterate connector and enable parallel execution

4 If the amount of data in the file is not very large, cache the data in memory using tUnite before tFileOutputExcel to avoid too much interaction with IO file.

 

Regards

Shong

gjeremy1617088143

Hi,

you do extraction transformation and load a the same time it can make the job very slow .

 

Maybe you can try to send all the data in memory with tHashOuput for example,

and after onsubjobok link tHashInput-->main row--> tFileOutputExcel.

 

You can test for example : desactivate tFileOutputExcel ans see your row/s processing.

if you have an average row process speed so you can test my solution.

Send me Love and Kudos

 

benu
Contributor III
Contributor III
Author

Hello folks. @Shong - those suggestions worked great! I implemented all but the parallel execution setting on the iterate connector, for when that was enabled, my job would not compile (error "Detail Message: Local variable tos_count_tFileOutputExcel_1 defined in an enclosing scope must be final or effectively final There may be some other errors caused by JVM compatibility. Make sure your JVM setup is similar to the studio.")

 

However, when the other changes were made, the job executed in a few seconds, rather than taking hours.

 

Thanks very much for your prompt response and helpful advice! -Ben

Anonymous
Not applicable

@ben uphoff​, I tried and have the same issue, I am not sure if it is a bug, I will check it with our developers.

As gjeremy1617088143 suggested, send all the data in memory with tHashOuput for example,

and after onsubjobok link tHashInput-->main row--> tFileOutputExcel.

 

Regards

Shong