Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
gurn
Contributor III
Contributor III

DataPrep of Large File Fails with Read Timeout

Hi

We have implemented a job to pull 53m records with 11 columns to pass through to the tRunDataPrep component in a job. The recipe is pretty simple - nothing more complex than uppercase, removing whitespace, extracting numbers from fields etc.

We are using a joblet to run the execution as we have 11 different sources and need a different schema per recipe and so this offers minimal configuration requirements for each - we have looked at Dynamic Schema but this doesn't work for us given the way we are needing to orchestrate, so this is an approach we are comfortable with.

The job runs fine until we get to 6m records and it times out with a socket error. I've attached the images of the flow and error and would appreciate some thoughts on the resolution. Following some research into JVM timeout parameters I have just added

-Dws_client_connection_timeout=180000

-Dws_client_receive_timeout=180000

now rerunning but would appreciate some thoughts on this issue and how best to resolve it.

Attached is the flow plus the error output.

Labels (8)
2 Replies
Anonymous
Not applicable

Hi

I have directed your question to our developers and hope they can get back to you soon.

 

Regards

Shong

gurn
Contributor III
Contributor III
Author

Thanks

 

I have restructured the job and now extract it into flat files in 2m chunks. This is then read into the next subjob and into the tRunDataPrep. All now works fine and executes as expected.

 

Would be good to know if there is a way to not have to chunk the data up to get this through so please do provide an update here when available.

 

Thanks

Dave