Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Yose
Contributor

Talend job to Insert 3 million data from positional file to staging table in SQL Server is very slow

My requirement is to load data from positional file to staging table(truncate and insert). Positional file resides in file server and SQL server installed on DB server and Talend job is running on file server. So, here Bulk insert can't be used.

I have created the job in talend and its inserting the data correctly, but the issue with this process is that it's very slow. Its processing 100-200 rows/s. I tried different ways to improve the performance but had no luck. 

Here are the process and system details

System RAM:4GB/2 processors(Intel Xenon)

OS: Windows sever 2012 R2/Windows 7

Talend Data Management Version: 7.3.1

Approaches I tried

1. used tFileInputPositional(To load data from positional file)->tmap(no joins in tmap)->tDBOutput and JDBC connection

  updated the parameters -Xms256M and -Xmx1536M 

  Used parallelization and created 2 threads from tFileInputPositional->tmap and then departitioned before tDBOutput. Even after this, it didn't improve the performance

  

Currently Talend job is running for more than one and half hours to load 3 million data to SQL server staging table. I'm able to insert same 3 million records to staging table in 10 mins using SSIS. How can i improve the talend job performance and reduce the data processing time to 10 mins like in SSIS? 

Please let me know if anyone has any suggestions.

Thanks in advance!

Labels (4)
5 Replies
gjeremy1617088143

Hi, maybe you can send the result of the tmap in a hash, then read the hash to insert data in TDBOutput, also check that your tDBOutput is configured only on insert and not update or insert.

Send me love and kudos

Yose
Contributor
Author

Hi Jeremy,

I tried what you've suggested above. Now I'm getting memory issue in tHashOutput component. Error message is "the paging file too small for this operation to complete".

Could you please tell me how to resolve this memory issue tHashOutput component?

Thanks in advance

gjeremy1617088143

else if thashoutput don't solve the problem, you could try to write data in a tempory folder, it's in basic settings of the tMap component

https://help.talend.com/r/HqPypUO_cLctKv_dAWcgFw/GKM0cFBcfseClZruxOzlTw

you could also increase the commit size in the tDbOutput component

 

Yose
Contributor
Author

Even if I write data in temp folder, will this improve the job performance?

gjeremy1617088143

there is a lot of parameters wich can influence your job performance, it could be great to have an idea of what you do in your tMap, also the configuration of your tDBoutput, it will better if you can send pictures of your job

also 4gb ram is too small for Talend project you should consider to user 8GB minimum

try use JTDS instead of JDBC, much faster