I’m still new to TOS and figuring out stuff as I go along. Below is the next issue I have.
I initially had a job that involved a PostgreSQLInput component that extracted 5 columns from one table, a tmap in which I executed some data type conversions, created some date fields and loaded the data back into PostgreSQL. That went reasonably fast for a source table with less than a 100000 records or so. Then I received a source table with 32 million records and the performance went down to a grind halt and 70 rows/sec or so after I even increased the JVM arguments to XMs2560M and XMx7000M on my laptop.
I figured it must have been the tmap component in which I performed perhaps too many conversions and so I split it out and used a dedicated tconvertype. The performance didn’t increase either.
Then I decided to create a test job (attached screenshot) and reduced the complexity to only the PostgreSQLInput component with a simple select of just 5 columns and a tfileoutput delimited component into a flatfile. For the 32 million dataset, it would takes ages for the job to even start processing and eventually give a java memory heap error.
I then recreated a similar job in SSIS and SSIS was able to load all 32 million records in 3 minutes and 42 secs with an avg. speed of 144,144 rows/sec.
In the meantime, I’ve played with some numbers and fetched 1M records still with good performance of 232,612 rows/sec. , 10M records and 240,028 rows/sec, 20M records and 76377 rows/sec, 30M records and 20065 rows/sec. More than 31M records seemed to freeze the job.