100% of CPU on a virtual machine vs 30% on local d... - Qlik Community

_AnonymousUser · ‎2015-07-22

Hi everybody,

Context :
The goal of the job is to convert data, it deals with the duplication (tUniqRow) etc, from a table containing 300k lignes. That table got +/- 250 columns (heavy... but can't deal in another way).
We are using TOS, SQL Server, and components like that tUniqRow.
It's possible to go without that component but that's not the point of that subject.

Problem :
While executing the job on a local machine : 30% of cpu usage.
While executing the job on the production environment which is a virtual machine : 100% of cpu usage and the following java heapspace error : Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
The taskmanager shows indeed the 100% of usage while the RAM is at 3 on 8 Go available.

Both CPU are almost the same, xeon5675 for the local one and 5660 for the VM.

Question :
Is there any configuration required (or advised) for installing/executing Talend on a VM ?

Thanks for your time.
A.

Anonymous · ‎2015-07-22

How many other virtual machines are using the physical machine's resources? What are the total resources (of the physical machine) and what are the resources assigned to each VM? RAM, CPUs, etc. This is a memory issue, so if you have more memory available on your Studio machine, that will be a reason why it is not suffering so much.

Have you set the JVM settings for the Job? If so what are they?

The tUnique component will likely be the cause of this. It will need to store all of the records in memory. Are the records coming from the same DB? If so, can you not deal with the uniqueness of the dataset in the DB? It is *MUCH* more efficient to do this there. No matter what anyone tells you, Talend is meant to be used in collaboration with other tools. If yuo have lots of records from different sources, then it is a great tool for dealing with uniqueness,etc. But if your data is coming from one DB, it is silly to export this much data into memory to process in Java. Yes, it can do it. But it is a waste of resources.

You can use a query in your DBInput component that deals with uniqueness and then this is handled by the DB. The data coming into Talend will then be unique and you can do whatever else you need to do without having this memory issue.

100% of CPU on a virtual machine vs 30% on local developpement

Talend Data Integration

v6.x