Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

File size

I am trying to run some CSV data through the tool and it keeps crashing at the end of the adding dataset process.  The file itself is about 2.8GB, which is what I may end up dealing with if I pull from an HDFS datastore.  Is there a maximum file size that the tool can ingest and is there a way to tune the Java stack?

Labels (2)
14 Replies
Anonymous
Not applicable
Author

What's the error message? There is no clear limitation on file size, normally, you might get outOfMemory exception for processing large of data set. Can you please provides more details about your job?
Anonymous
Not applicable
Author

When the file load gets to 100%, the application just throws a generic error. 
" Server error
 An error occurred"


There's an 'x' to close the message and that's it.  The file is being loaded locally, can be opened in a text application, and iterated through in other applications just fine.  The system being used is Windows 7 64-bit with an i7 processor, and 16GB of RAM.
Anonymous
Not applicable
Author

Can you please go do this:

Quit data prep
Delete the app.log file
Run data prep and reproduce your scenario
Attach your app.log here

So where is app.log? On Windows: C:\Users\ \AppData\Roaming\talend\dataprep\logs\app.log
Let me know if you are on Mac (the file is hidden but there is a trick).
Don't worry, app.log only captures the software's activity and detailed error logs, there are no information about your data. Feel free to check it out in a text editor too.

Free Desktop is designed to load the entire dataset in memory; as a safeguard however, it will arbitrarily limit a preparation to 10,000 rows. This is of course not a hard limit (our code being open source that would be silly wouldn't it), just a measure to prevent users from crashing the app... Well obviously that didn't work well in your case 🙂  The app.log should tell us what went wrong.
Thanks!
Anonymous
Not applicable
Author

The log file should be attached, but in a cursory review it looks like it is Java heap space.
app.log.log
Anonymous
Not applicable
Author

I can't seem to see the attachment. Can you try to zip it perhaps?
I am looking forward to looking into it because, again, DP Free Desktop precisely cuts off at 10K to avoid your very error. So I'd love to understand why it doesn't do it in your case.
Anonymous
Not applicable
Author

We'll see if this works.  I noticed when I uploaded the file it added the extension twice.

appLog.zip_20160216-1430.zip
The link should be: www.talendforge.org/forum/img/members/314254/appLog.zip_20160216-1430.zip
Anonymous
Not applicable
Author

Hi Brian
How many rows does the file contain (roughly) ?
Thanks
Anonymous
Not applicable
Author

There are about 8.8 million lines in the file, which isn't atypical for the data I work with.
Anonymous
Not applicable
Author

To add to the previous question (sorry for the hassle):
1) what is the actual content and format of the file, is it a list of transactions, or a log file etc?
2) what is the delimiter? how many cols do you see in a text editor?
3) where is your file located, local drive?
Any chance you can share the first row, or first 2,3 or even 10 rows, after masking what you (understandably!) wouldn't share on a public forum?
Just trying to understand what is so special with this file. There is indeed an out of memory error and this is pretty unexpected.