Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
FGuijarro
Contributor III
Contributor III

How to increase rows/second read

Hi,

I´m trying to create a new table with 2 columns.

That table is the result of join 11 tables with (15 cols/table and 20.000.000 rows/table) .

The read-speed started excellent (around 40.000rows/sec) but it started to run much slower. The last 2 hours is running with 4000rows/sec.

How can I increase that speed?? Below attached the entire process and its current speed.

Thank you.

0693p00000AGPe1AAH.png

Labels (3)
4 Replies
Anonymous
Not applicable

Hello,

What's the size of your RAM? What's talend product are you using?

Generally speaking, the followings aspects could affect the job performance:

1. The volume of data, read a large of data set, the performance will degrade.

2. The structure of data, if there are so many columns on tDBRow, it will consume many memory and much time for transferring the data during the job execution.

3. The database connection, the job always runs better if the database is installed on local, if the database is on another machine, even you are on VPN, you may have the congestion and latency issues.

Best regards

Sabrina

FGuijarro
Contributor III
Contributor III
Author

Hi xdshi,

Thanks for your answer.

I´m using Talend 7.3.1.20200219_1130.

RAM is 8GB on in a server processor with 6 cores.

Memory configuration in Talend is: Xms2049M, Xmx8192M.

The 11 tables have around 80.000.000 rows/table and each table has in between 20GB and 40GB.

 

The problem I have is that running gives me 2 different messages:

  • "java.sql.SQLNonTransientConnectionException: (conn=-248284684) unexpected end of stream, read 50 bytes from 84 (socket was closed by server)"

and

  • java.lang.OutOfMemoryError: Java heap space

Thanks for your help!!

 

Anonymous
Not applicable

Hello ,

 

Does this perf issue occur for 1 or all the jobs ?

if all the jobs :

Could you add your workspace on antivirus exclude list ?

Could you disable the drive indexing made by the operating system ?

 

If only this job:

What is the allocate heap size ? (Xmx value on ini file)

Where is located the db: on a local drive or a remote one

 

Can you check if the performances impove if you set parallelization on the job (right click / enable parallelizationn)

 

FGuijarro
Contributor III
Contributor III
Author

Hi tsesdl,

Thanks for your answer! I´ve tried with only one table and also occurs.

System under my Talend is running is:

  • Intel Xeon Gold CPU@2.30Ghx (2 processors)
  • Windows Server 2019 64bits
  • RAM: 8GB

If checking JVM memory assigned with command prompt:

java -Xshowsettings: vm

VM settings:

   Max. Heap Size (Estimated): 1.78G

   Ergonomics Machine Class: client

   Using VM: Java HotSpot(TM) 64-Bit Server VM

 

In Talend, before running, I´m always changing in "Run" tab memory to:

-Xms256M

-Xmx8016M

 

However, the file TOS_DI-win-x86_64.ini contains:

-vmargs

-Xms512m

-Xmx1536m

-Dfile.encoding=UTF-8

-Dosgi.requiredJavaVersion=1.8

-XX:+UseG1GC

-XX:+UseStringDeduplication

-XX:MaxMetaspaceSize=512m

 

Running job with parallelization, I get this error message:

"Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded"

 

Thanks for your help!