Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
MPT
Contributor III
Contributor III

tFileInputDelimited(or job?) stops working seemingly randomly

We are seeing a weird problem with TOS 7.3. We deploy / build a standalone job and run it on a Windows machine. The job executes every 10 minutes via a task scheduler.

 

 

  • There's a main job that downloads files from FTP
  • main job begins to iterate collection of files
    • Each iteration calls a tRunJob with the iterated file name as parameter
      • Childjob has a tFileInputDelimited that consumes the file and does all sorts of fun stuff
      • After processing the file, childjob ends and iteration continues to next file in Main job

 

Here's a small extract of a successful log file:

 

2021-12-09 16:55:04,692 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tFileList_3 - Start to list files

2021-12-09 16:55:04,692 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tFileList_3 - Current file or directory path : \\xxxxxxxx\xxxxxxxx\xxxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxxxxxxxxxxxxxxxxxxxx.csv

2021-12-09 16:55:04,707 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tRunJob_5 - The child job 'dataintegration.absencesandholidaystotable_1_2.AbsencesAndHolidaysToTable' starts on the version '1.2' with the context 'Production'.

2021-12-09 16:55:04,707 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] TalendJob: 'AbsencesAndHolidaysToTable' - Start.

Received file'\\xxxxxxxx\xxxxxxxx\xxxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxxxxxxxxxxxxxxxxxxxx.csv' as input parameter

2021-12-09 16:55:05,364 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] tFileInputDelimited_2 - Retrieving records from the datasource.

2021-12-09 16:55:05,364 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] tFileInputDelimited_2 - Retrieved records count: 1.

 

 

The job might execute successfully 120 times (= every 10 minutes for 24, 25, 26 hours but then, without any clear reason, one of the executions dies like this:

 

 

2021-12-09 15:25:03,932 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tFileList_3 - Start to list files

2021-12-09 15:25:03,932 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tFileList_3 - Current file or directory path : \\xxxxxxxx\xxxxxxxx\xxxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxxxxxxxxxxxxxxxxxxxx.csv

2021-12-09 15:25:03,963 [INFO] d.t.TALE35_AbsencesAndHolidaysService [Thread-2] tRunJob_5 - The child job 'dataintegration.absencesandholidaystotable_1_2.AbsencesAndHolidaysToTable' starts on the version '1.2' with the context 'Production'.

2021-12-09 15:25:03,963 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] TalendJob: 'AbsencesAndHolidaysToTable' - Start.

Received file'\\xxxxxxxx\xxxxxxxx\xxxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxx\xxxxxxxxxxxxxxxxxxxxxxxxxx.csv' as input parameter

2021-12-09 15:25:04,526 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] tFileInputDelimited_2 - Retrieving records from the datasource.

2021-12-09 15:25:04,542 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] tFileInputDelimited_2 - Retrieved records count: 0.

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

 

The file definitely contains data. By all accounts valid data. The tFileInputDelimited component will not be able to consume the FTP files anymore, no matter how many times we try to rerun the job. And here is the weird thing:

 

If we re-build the job without changing anything, just rebuild it over the old publish, then on the next run the job works again and can consume the files. The it will again dies sometime 24,25,26 hours laters and will not recover by time. If we again rebuild the job to an executable jar, the problem goes away. It is like the deployed package breaks every now and then but I can't imagine any scenarious where that might happen.

 

So I'm looking for debugging ideas, what could cause the deployed package to repeatedly break so that rebuilding the job fixes the problem?

 

  • It does not seem to be in the files, because after rebuilding the job, the same files are consumed that the job could not consume just minutes earlier
  • The number of files (few or 50) does not seem to make any difference, the job might die even if there is just one file to process
Labels (2)
4 Replies
Anonymous
Not applicable

The files are located in a shared folder? and each file has data? I see one log message mentions "[Thread-2] tFileInputDelimited_2 - Retrieved records count: 0." It looks like the error occurs on accessing the file. For debugging, check the 'die on error' box on tFileInputDelimited component and the 'die on error' box on tRunJob, this allows the component throws out the Java exception once an error occurs.

 

Regards

Shong

 

MPT
Contributor III
Contributor III
Author

"The files are located in a shared folder? and each file has data?"

 

Correct. The main job first downloads the files to the shared folder (there might be 1, or 200) and after tFTPGet is done, we start to iterate those downloaded files and pass them to the childjob. Each file that we have investigated contains correct data, same kind of data that has previously passed. We even checked new line characters and file encodings if those would be different in files that fail to be processed but that is not the case either. The shared folder seems by all accounts to be accessible all the time. No errors indicate a file access issue.

 

 

"I see one log message mentions "[Thread-2] tFileInputDelimited_2 - Retrieved records count: 0." It looks like the error occurs on accessing the file. For debugging, check the 'die on error' box on tFileInputDelimited component and the 'die on error' box on tRunJob, this allows the component throws out the Java exception once an error occurs."

 

These are already checked. The last thing in the job logs is this:

2021-12-09 15:25:04,542 [INFO] d.a.AbsencesAndHolidaysToTable [Thread-2] tFileInputDelimited_2 - Retrieved records count: 0.

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

[statistics] disconnected

 

The death of the job seems not to be clean, controlled. When the job succeeds, the last line is

2021-12-10 07:35:04,684 [INFO] d.t.TALE35_AbsencesAndHolidaysService [main] TalendJob: 'TALE35_AbsencesAndHolidaysService' - Done.

Anonymous
Not applicable

Hello,

Did you try to modify the context variables in the parent Job tRunJob, and select the Transmit whole context and Die on child error check box?

Would you mind posting your job design screenshots here which will be helpful for us to address your issue. Please mask your sensitive data.

 

Best regards

Sabrina

MPT
Contributor III
Contributor III
Author

I have not experimented with the Transmit whole context option, but the Die On Child Error is checked. As you review these settings, please consider that it works 99,9% of the time. When it fails, there are no clear abnormalities to be detected: the files that fail to process are the same size as the previous that succeeded, the folder and filename follow the same syntax, they look like they should have processed and as a fact, they do after rebuilding and rerunning that job.

 

Here is the main job:

0695b00000Lx595AAB.png 

Here is the tRunJob (one of those):

0695b00000Lx5CiAAJ.png 

Here is the child job. The component "Absence or Holiday" is the tFileInputDelimited_2 that is the main concern of our issue. That is the component that randomly fails to read a file that by all means of monitoring, in Notepad++, contains valid data. And after the job is re-built without modifications and re-run, the file is read. Go figure.

0695b00000Lx59jAAB.png