Solved: CSV options 2.4m row limit? - Qlik Community

talendtester · ‎2019-01-28

I changed TOS_DI-win-x86_64.ini to:

-vmargs
-Xms15120M
-Xmx20480M
-XX:MaxPermSize=18048m
-XX:+UseParallelGC
-Dfile.encoding=UTF-8

My input file has 11.6M rows with 83 columns per row.

My job:

tFileInputDelimited > tFileOutputDelimited

The tFileInputDelimited has Advanced settings CSV options checked.

I am trying to split the file into several smaller output files but the job always fails at 2,402,585 rows.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstramyJobStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstramyJobStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at com.talend.csv.CSVReader.readNext(CSVReader.java:288)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.tFileList_1Process(myJob_ver3.java:7313)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.runJobInTOS(myJob_ver3.java:14507)
at talenddemosjava.myJob_ver3_0_1.myJob_ver3.main(myJob_ver3.java:14352)

When I change the job to this, it worked successfully:

tFileInputFullRow > tFileOutputDelimited

Is there a max limit on CSV options of 2,402,585 rows?

talendtester · ‎2019-01-29

Thanks vapukov.

I did split into 12 files with 1M, but was still getting the Java Heap Space error.

Next, I split the file into files with 300K, but was still getting the Java Heap Space error.

Then I turned on check each row structure against schema. Turns out there were about 6 rows which had \" in one of the columns which was throwing off the columns.

I changed the Escape char to "\\" then the job ran successfully!

View solution in original post

vapukov · ‎2019-01-28

Hi,

you can check

2,402,585*86*size_of_all_columns what your result?

if you already split to several files, did you test with 12 files by 1M? if split why not split for moderate size?

also, is tFileInputDelimited > tFileOutputDelimited it is your full job design? (if no transformation, why you need this job at all?) or you have something between components?

talendtester · ‎2019-01-29

Thanks vapukov.

I did split into 12 files with 1M, but was still getting the Java Heap Space error.

Next, I split the file into files with 300K, but was still getting the Java Heap Space error.

Then I turned on check each row structure against schema. Turns out there were about 6 rows which had \" in one of the columns which was throwing off the columns.

I changed the Escape char to "\\" then the job ran successfully!

CSV options 2.4m row limit?

Java

Talend Data Integration

v7.x