My first step would be to profile what the load is when the failure happens. I'd recommend setting your performance logging to 5 minutes so you can track how many documents are loaded and the RAM usage.
How many documents are loaded and how many user sessions are active when the failure occurs? Check the event log for document loads to see if the failure occurs after a specific document is loaded. You can also turn on audit logging to see if a specific activity preceeds the failure.