tPostgresQLInput performance scaling issue

Anonymous — Wed, 03 Jun 2020 18:59:21 GMT

I’m still new to TOS and figuring out stuff as I go along. Below is the next issue I have.

I initially had a job that involved a PostgreSQLInput component that extracted 5 columns from one table, a tmap in which I executed some data type conversions, created some date fields and loaded the data back into PostgreSQL. That went reasonably fast for a source table with less than a 100000 records or so. Then I received a source table with 32 million records and the performance went down to a grind halt and 70 rows/sec or so after I even increased the JVM arguments to XMs2560M and XMx7000M on my laptop.

I figured it must have been the tmap component in which I performed perhaps too many conversions and so I split it out and used a dedicated tconvertype. The performance didn’t increase either.

Then I decided to create a test job (attached screenshot) and reduced the complexity to only the PostgreSQLInput component with a simple select of just 5 columns and a tfileoutput delimited component into a flatfile. For the 32 million dataset, it would takes ages for the job to even start processing and eventually give a java memory heap error.

I then recreated a similar job in SSIS and SSIS was able to load all 32 million records in 3 minutes and 42 secs with an avg. speed of 144,144 rows/sec.

I also searched the forums here and I came across another user with similar performance issues for PostgresQL. https://community.talend.com/t5/Design-and-Development/tPostgresqlInput-Query-Slow-Through-Talend-Yet-Fast-When-Running/td-p/54319

In the meantime, I’ve played with some numbers and fetched 1M records still with good performance of 232,612 rows/sec. , 10M records and 240,028 rows/sec, 20M records and 76377 rows/sec, 30M records and 20065 rows/sec. More than 31M records seemed to freeze the job.

Anyone knows what’s going on?

topic tPostgresQLInput performance scaling issue in Talend Studio

tPostgresQLInput performance scaling issue