Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

[resolved] tSortRow and Large Files

I am just starting with TOS 5.6.0 and I am trying to sort a large CSV file (2.5GB, 11M rows, 45 columns).  I am setting JVM to 2GB and I've tried various sizes of buffer for the external sort in Advanced tab.  The error stack shows that the out-of-memory occurs in various places, but the results is always similar to:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at java.util.LinkedList.listIterator(LinkedList.java:667)
at java.util.AbstractList.listIterator(AbstractList.java:284)
at java.util.AbstractSequentialList.iterator(AbstractSequentialList.java:222)
at routines.system.RunStat.sendMessages(RunStat.java:261)
at routines.system.RunStat.run(RunStat.java:225)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.StringBuilder.toString(StringBuilder.java:430)
at com.talend.csv.CSVReader.endColumn(CSVReader.java:131)
at com.talend.csv.CSVReader.readNext(CSVReader.java:301)
at johnmdm.sqlinout_0_1.SQLInOut.tFileInputDelimited_1Process(SQLInOut.java:3380)
at johnmdm.sqlinout_0_1.SQLInOut.runJobInTOS(SQLInOut.java:5199)
at johnmdm.sqlinout_0_1.SQLInOut.main(SQLInOut.java:5056)
--john
Labels (4)
2 Replies
Anonymous
Not applicable
Author

Hi
Take a look at this KB article, to resolve this error, try to store the data on disk instead of memory, check the 'sort on disk' box on the advanced setting tab of tSortRow component.
Best regards
Shong
Anonymous
Not applicable
Author

Hi, I am trying to solve an performance issue around sorting huge file(50 Million record) to be sorted on Integer column+Alpha column(file has 6 columns). tSort takes around 30 mins with enabling sort on disk .
I am using TOS 5.6.2 and evaluating this sort for my POC . Please advise and the optimized job design .