We are experiencing the same issue. A given reload task will fail during the night but then when I rerun it in the morning, it runs without error. It only happens periodically, but is very frustrating to try and track down the issue only to find nothing.
It would be nice to have something that restarts a failed task x amount of times.
If you have a time-based schedule, have you considered triggering the job (from a different job) to run multiple times within the desired timeframe? You would need to be sure that the queue if already running setting is unchecked (you don't want it to queue) -- this is in the trigger task settings.
If the job should take 30 minutes to run successfully, but when it fails it fails within 5 or 10 minutes, you could have the trigger job kick off every 5 minutes from time 0 to time 20 or something.
Just a thought.
Good ol' Publisher... It works when it works...