Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
kleinmat
Contributor III
Contributor III

Find out if a ZIP / GZ file is corrupt

Hi there,
I have this very "funny" scenario where I receive .GZ files from a partner system - but some of the files are corrupt.
So I need to have a process which checks the integrity of the file - and if it is corrupt, skip it.
The process may not abort, though, which is the default behavior of Talend.
So how can I make sure I only process "good" files and skip "bad" ones?
Thanks
Matt

Labels (2)
6 Replies
willm1
Creator
Creator

I expected the 'check integrity' to skip bad files and proceed gracefully. As alternative, how about using a tSystem component and executing the unzip command per file. That way you can also capture info about what files returned errors.
kleinmat
Contributor III
Contributor III
Author

That's what I expected, too. Unfortunately, it does not act that way. Instead, it crashes as if the checkmark was not set. Well "crashing" is not the right word: it throws an exception and jumps out of the iterator. So it does not continue after detecting a bad file.
Plus, the integrity check does not even work properly (e.g. it does not recognize incomplete files; for instance files that have been properly compressed but were only transmitted to a target system incomplete)
willm1
Creator
Creator

I had faced a similar issue in the past... Agree with you. I've created a Jira ticket to have this looked at and hopefully resolved soon -  https://jira.talendforge.org/browse/TDI-33802
kleinmat
Contributor III
Contributor III
Author

Thank you willm

Let's hope Talend does something about it soon.
In TOS DI 6.0, the issue is still there, though
Anonymous
Not applicable

Thanks for reporting this issue in our bugtracker.
Anonymous
Not applicable

hi all,

for job that I expect 'normal' failure - corrupted zip file are a good example 0683p000009MACn.png - I put them in a 'child' job and uncheck the 'Die on child error' option.

The father job manage loop and/or iteration and children jobs only the threatment.
Could be a workaround ...

hope it help

regards
laurent