Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
talendtester
Creator III
Creator III

CSV options 2.4m row limit?

I changed TOS_DI-win-x86_64.ini to:

-vmargs
-Xms15120M
-Xmx20480M
-XX:MaxPermSize=18048m
-XX:+UseParallelGC
-Dfile.encoding=UTF-8

 

My input file has 11.6M rows with 83 columns per row.

 

My job:

tFileInputDelimited > tFileOutputDelimited

 

The tFileInputDelimited has Advanced settings CSV options checked.

 

I am trying to split the file into several smaller output files but the job always fails at 2,402,585 rows.

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstramyJobStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstramyJobStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at com.talend.csv.CSVReader.readNext(CSVReader.java:288)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.tFileList_1Process(myJob_ver3.java:7313)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.runJobInTOS(myJob_ver3.java:14507)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.main(myJob_ver3.java:14352)

 

When I change the job to this, it worked successfully:

tFileInputFullRow > tFileOutputDelimited

 

Is there a max limit on CSV options of 2,402,585 rows?

Labels (3)
1 Solution

Accepted Solutions
talendtester
Creator III
Creator III
Author

Thanks vapukov.

 

I did split into 12 files with 1M, but was still getting the Java Heap Space error.

 

Next, I split the file into files with 300K, but was still getting the Java Heap Space error.

 

Then I turned on check each row structure against schema. Turns out there were about 6 rows which had \" in one of the columns which was throwing off the columns.

 

I changed the Escape char to "\\" then the job ran successfully!

 

 

View solution in original post

2 Replies
vapukov
Master II
Master II

Hi,

 

you can check

2,402,585*86*size_of_all_columns what your result?

 

if you already split to several files, did you test with 12 files by 1M? if split why not split for moderate size?

 

also, is tFileInputDelimited > tFileOutputDelimited it is your full job design? (if no transformation, why you need this job at all?) or you have something between components?

 

talendtester
Creator III
Creator III
Author

Thanks vapukov.

 

I did split into 12 files with 1M, but was still getting the Java Heap Space error.

 

Next, I split the file into files with 300K, but was still getting the Java Heap Space error.

 

Then I turned on check each row structure against schema. Turns out there were about 6 rows which had \" in one of the columns which was throwing off the columns.

 

I changed the Escape char to "\\" then the job ran successfully!