Hi,
I'm trying to migrate a table MSSQL with 13Million rows to an other one Mysql but after 700K rows the job stops and returns me an error mesage.
Out of Memory GC exceed limit
I put a tuniqrow component to filter out duplicate rows
So I have on input mssql table > tuniqrow > tmap >output mysql1
> tmap >output mysqldup
Can someone help me to configure this
If you need more info feel free to ask
I already tried with the avanced setting save on disk of the tuniqrow but same result
Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.CharBuffer.wrap(Unknown Source)
at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:505)
at com.mysql.jdbc.PreparedStatement.setString(PreparedStatement.java:3721)
at usage.load_cdr_0_1.LOAD_CDR.tMSSqlInput_1Process(LOAD_CDR.java:10777)
at usage.load_cdr_0_1.LOAD_CDR$1.run(LOAD_CDR.java:13900)
Hi together,
I didn't looked into the source code of mssql / mysql but I think they should work on row based. Please correct me (and looks like I'm wrong otherwise there wouldn't be an OutOfMemory error). But adding a tUniqueRow would be a bottleneck for memory usage. Do not use this on high load if you didn't need it.
@Yogesh: Any idea where the memory is used? Does tMap store the full data from the main row before processing (like a lookup)?
Bye
Volker
Thank you guys for your answers and questions,
Actually I solved the problem by adding a date dimension table to filter the BIG table thanks to a tflowtoiterate.
and it loads me my target table with the data of my MSSQL table with 13millions rows in 40 minutes
For your info I tried both solution with store data on the disk and without and it's exactly the same.
Hakim