Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to Reduce the CPU utilization while running talend jobs

Hi all,
I have around 40 jobs. all jobs are running continuesly. now problem is jobs are taking cpu more then 150%.
We run the talend job on REDHAT. and talend version is 5.2.2.
Jobs will be picking file from directory, transform it and write to other file. tMap vl be used to transformation.
If Cpu usage goes like this then all our jobs will be of no use. So please suggest me how can i reduce CPU utilization.

Thanks and Regards,
Akshath
Labels (2)
8 Replies
Anonymous
Not applicable
Author

Hi Akshath,
When you execute single job, what is the CPU utilization?
Out of 40 jobs which job is taking highest CPU utilization?
Jobs are scheduled individually or they are part of single job?
What are the file sizes?
What complex transformations you have ?
How much memory you have?
What is the processor?
What other processes are running and their CPU utilization when job executes?
Can you pl get some information on this, we might get some clue from above checks.
Thanks
Vaibhav
Anonymous
Not applicable
Author

hi,
1) Single job also taking more then 100% cpu.
2) all jobs structure is same, olny transformation is different. so all jobs taking cpu utilization more then 100 or 15%.. some time it reaches 250% also.
3) Jobs are scheduled individually. all the jobs .sh files are called from perl script.
4) max filesize is 37MB. minimum is 200KB.
5) Transformation is :
i) date formation, to specific format.
ii) Look from vertica table for some columns.
iii) We are checking conditions for null value.
iii) also catching error reject link from tMap.
6) RAM is 126GB. and 24 core cpu we are using.
7) other process cpu is not getting effected wen i run talend jobs.
Note: when we read list file we just check in mysql database table whether that file is already processed or not. if its not processed then we take that file and process. after execution of each file we update mysql table that its already processed. so for this task we are using mysql procedure, that is called using tMysqlSP.
Regards,
Akshath Hegde
Anonymous
Not applicable
Author

This CPU usage is sum of all the CPUs or single CPU?
Can you please analyze and show the output of "top" and "iostat -x" "uname -r" (before and after executing the job)
Whether this issue is only with these job or other jobs also have similar problem?.
Vaibhav
willm1
Creator

tayana_akki - have you tried specifying the amount of memory for the job to use - in the job's Job tab? This will dictate how much your JVM requests when it's instantiated...
Anonymous
Not applicable
Author

I can't see any issue with 100% CPU provided it is doing productive work. You would, of course, want to use all of the resource available?
It sounds like you're polling a directory for arrival of files? Are you sleeping for a period of time between polls to allow some breathing space and for fresh files to arrive.
You say you are checking your database to see if a file has already been loaded. Aren't you are archiving loaded files to make this unnecessary? You'd also want to keep the number of files in a directory as low as possible.
I think you should start with looking at your throughput and if it is reasonable. Looking at CPU along is not the best approach.
That said,with jobs that sound like they mostly read and write from disk,you'd expect CPU to be a lot lower, I think.
Anonymous
Not applicable
Author

sanvaibhav,
Asked screen shots are uploaded... These are screen shots After Load starts.
willm,
I think your suggesting about heap memory. i checked the heap size Xms is 256MB and Xmx is 1024MB
tal00000,
Loaded files will not be checked again because that will be moved to different directory. we are keeping all files track in a one mysql table so thats why we are checking each time from a table.
and about continues polling. Ya if file not there also it will be continously checking that directory whether file is there or not. this i can check with some sleep value.

Thanks to all, for valuable advice please check screen shot and let me know what i can do.
Regards,
Akshath Hegde
0683p000009MDrl.png 0683p000009MDoQ.png
Anonymous
Not applicable
Author

sanvaibhav,
Before Loading screen shots.
Regards,
Akshath Hegde
0683p000009MDhW.png 0683p000009MDkU.png
Anonymous
Not applicable
Author

Hi Akshath,
Your perl programms are also using @99% of cpu usage... I think this should not be an issue. Your data flow and operations involves use of custom computations so is using higher cpu usage... Check 15th comment explanation at http://superuser.com/questions/457624/why-is-the-top-command-showing-a-cpu-usage-of-799 200% CPU usage for 24 core cpu is a very minimum... I don' think that it will hamper your etl code. For your 24 core CPU 2400 would be the maximum usage.
There are two java applications or programs consuming higher cpu usage... If you have used some custom code to perform computations, try using talend components in place of java code. This will improve the job efficiency.
If you can put the screenshot and abstract details of how you implemented the transformation will give some idea which help to identify problem.
How you are doing the
i) date formation, to specific format and what logic is involved ??
2) How you are looking at vertica tables ?? any joins with incoming data?
3) ....
Vaibhav