Hello,
I need to move millions of rows from a MSSQL server to Cassandra and i'm using the tCassandraOutputBulkExec. It works fine for thousands of records, but as soon as the number of rows reaches the hundreds of thousands, the job will start to slow down due to no garbage collection.
Is there a way to periodically generate the sstable and garbage collect the uneeded rows? Should I use tCassandraOutput?
Hi, Is there any error message printed on console when your job start to slow down? Could you please show us your tCassandraOutputBulkExec component setting screenshot? Best regards Sabrina
Hey! To put it simply, the garbage collector runs out of memory. My guess is the component tries to create one big massive sstable while holding all the data in memory. It must be said, that the table that I'am importing has around 38 columns. I've included screenshots of a successful run, the error and the component settings.
When using a week's worth of data, the job completes successfuly, but it fails when loading more than one month.
In place of doing Million rows in 1 go, can you try to break them in Chunks and then load? You can create a subjob for processing chunks and give an explicit GC clean command using tJava.
Thanks,
Sankalp
+919811103231