Skip to main content
Announcements
July 15, NEW Customer Portal: Initial launch will improve how you submit Support Cases. IMPORTANT DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
castiellll
Contributor III
Contributor III

Job memory performance

Hello,

 

I have a job that takes data from multiple ODS tables join them with multiple tMap and insert them to a table. I should have around 80GB of data and the main flow has around 85 000 000 rows (around 15 GB).

 

All the lookup tables are stored in temp files and RAM available is 25 GB for this job.

The insert are in batch and manual commit.

Even with this, the job is quit slow and turns for several days without ending yet.

 

Is there another kind of optimization I can do beside changing Talend maps to sql code ?

The problem is clearly not coming from the SQL engine.

 

What do you think is the average time for Talend to manage 80 to 100 Gb of data ?

0683p000009M4BI.png

 

Thanks in advance 0683p000009MACn.png

 

Regards,

Sofiane

 

Labels (2)
15 Replies
castiellll
Contributor III
Contributor III
Author

Hello,

 

We are working together with the dba on these flows and everything is ok from that side.

 

For the bulk component I've read that the running job should be in the same server as the SQL server which is not the case for me.

 

Thanx

Sofiane

JaneYu
Contributor III
Contributor III

did you try to increase this option in Run Job tab

Use specific JVM arguments by increasing the Xms  and Xmx?

the default is: Xms256M, Xmx1024M

You could increase to Xms1024M, Xmx4096M

castiellll
Contributor III
Contributor III
Author

Hello,

 

I am using 30GB of ram already and this is not a ram problem.

 

Thanx

Sofiane

David_Beaty
Creator III
Creator III

Hi,

 

The message you are getting, as it says, is that the connection of closed. So, this could be one of 2 things really:

 

  1. The Server is closing the connection.
  2. Your Talend job is closing the connection.

 

The most likely cause is the 1st option, check with the DBAs if there's any open connection timeouts, etc. Is the destination on-premise or cloud or elsewhere (with network contention)? Either way, I'd consider splitting the job into 2 distinct sections, one of accumulating the data you want to put into the DB (into a temp file) and the actual output of the data into the DB (from temp file to DB).

 

castiellll
Contributor III
Contributor III
Author

David,

 

Thank you for your reply.

 

For sure it doesn't come from server, as previously told, I work with the DBA on it.

 

Is there a possibility to set the Talend timeout to 0 ( I don't know where is this parameter) I only know that JDBC timeout is default to 0  ?

 

The infra is all on-premise and Talend server is connected to SQL server through Intranet.

 

Is it possible to use the bulk component while Talend job is not in the same server as SQL ? Or should I manually manage the temp file ?

 

Have a good day.

Sofiane.

David_Beaty
Creator III
Creator III

There may be some additional connection parameters you could pass regarding inactivity timeouts, which you could pass in the “Additional parameters”.

Previously, for SQL server, I’ve only done the bulk load from a file in the server, it was a file share I could write to. However, for Vertica, the bulk file could be remote.

Maybe create yourself a small test job to try it. I’m not able to test it right now to confirm.

Thanks