Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Bulk loading into Cassandra

Hello,
I need to move millions of rows from a MSSQL server to Cassandra and i'm using the tCassandraOutputBulkExec. It works fine for thousands of records, but as soon as the number of rows reaches the hundreds of thousands, the job will start to slow down due to no garbage collection. 
Is there a way to periodically generate the sstable and garbage collect the uneeded rows? Should I use tCassandraOutput? 
Labels (2)
4 Replies
Anonymous
Not applicable
Author

Hi,
Is there any error message printed on console when your job start to slow down? Could you please show us your tCassandraOutputBulkExec component setting screenshot?
Best regards
Sabrina
Anonymous
Not applicable
Author

Hey! To put it simply, the garbage collector runs out of memory. My guess is the component tries to create one big massive sstable while holding all the data in memory. It must be said, that the table that I'am importing has around 38 columns. I've included screenshots of a successful run, the error and the component settings.
When using a week's worth of data, the job completes successfuly, but it fails when loading more than one month. 
0683p000009MBeK.png 0683p000009MBZF.png 0683p000009MBWq.png
Anonymous
Not applicable
Author

In place of doing Million rows in 1 go, can you try to break them in Chunks and then load? You can create a subjob for processing chunks and give an explicit GC clean command using tJava.
Thanks,
Sankalp
+919811103231
Anonymous
Not applicable
Author

Hey, I'll probably go that route.
Thanks for the awnsers.