Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!
cancel
Showing results for 
Search instead for 
Did you mean: 
mborlo15
Contributor III
Contributor III

Global Map During Parallelization, Iteration, and Multi Threading

Does anyone have any insight to this warning in the Talend documentation? It makes it sound like we cannot use the global map when doing multi-threading, parallelization, etc.

 

This does not seem right though since that would be very limiting. I appreciate that Talend provides these warnings, but more details and perhaps examples would be useful.

 

Also to clarify, I am asking this question not just about the "iterate" connection's parallelization feature, but even tParallelize, and the multi-threading options in the configurations found on the job tab.

 

We are currently avoiding using Talend's parallelization feature because of this one warning.

 

Product: Talend Data Integration 7.3

 

https://help.talend.com/r/en-US/7.3/studio-user-guide-open-studio-for-data-integration/launching-par...

0695b00000OBa1IAAT.png

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

This is warning you that the globalMap is not thread safe due to it essentially being an instance of java.util.HashMap. There are many documents on this explaining it. However, you can use it in such a way that you can mitigate for thread safety issues during parallelisation. For example, the globalMap is not shared between jobs. So if you are running multiple child jobs in parallel, using the globalMap inside each of the child jobs will be OK. If you are using the globalMap in a single job with parallel processing, this is where you would need to be careful.

View solution in original post

4 Replies
Anonymous
Not applicable

This is warning you that the globalMap is not thread safe due to it essentially being an instance of java.util.HashMap. There are many documents on this explaining it. However, you can use it in such a way that you can mitigate for thread safety issues during parallelisation. For example, the globalMap is not shared between jobs. So if you are running multiple child jobs in parallel, using the globalMap inside each of the child jobs will be OK. If you are using the globalMap in a single job with parallel processing, this is where you would need to be careful.

mborlo15
Contributor III
Contributor III
Author

 

@Richard Hall​ 

Thank you! Just what I was looking for!

I would imagine this caution would also be needed for global variables generated by components, e.g.

((String)globalMap.get("tFileCopy_2_DESTINATION_FILENAME")), correct?

Anonymous
Not applicable

Hi @Matthew Borlongan​,

 

I am looking into this and will get back to you when I have some answers. I have never experienced issues with globalMap values stored about components, but I want to get a more thorough answer.

 

Regards

 

Richard

Anonymous
Not applicable

@Matthew Borlongan​  - OK, I have been speaking to R&D and have more I can tell you. First of all, these issues are all Java parallelisation issues. There is nothing here that you wouldn't experience with any other Java code generator. When you get to parallelisation, you need to think about thread safety and test for issues that may be introduced. The discussion I had did introduce the suggestion of adding some functionality to the Studio to look for potential parallelisation code flaws. These aren't perfect, but can point to areas where you may need to think again. So thanks for raising this question as it may have indirectly led to more functionality 🙂

 

Now, what can be done? The documentation omits a key feature that can protect your job from these parallelisation issues. If you go to your Job tab, then click on "Extra", you will see a "Multi thread execution" tick box. If you tick this and check the generated code, you will see that the globalMap variable is now synchronised. This will protect your job from the thread safety issues of the globalMap. There will be code similar to this....

 

private final java.util.Map<String, Object> globalMap = java.util.Collections

.synchronizedMap(new java.util.HashMap<String, Object>());

 

.....used to create the globalMap. This will remove the thread safety issues BUT can increase the chances of performance issues and possible deadlock situations. So, like the non-thread safe globalMap, will need thorough testing if it is used. This is also why it is not on by default for every job.

 

Essentially, when working without parallelisation, this is not an issue that anyone would have to worry about. However, when you introduce parallelisation, you do need to consider it and these issues cannot just be removed with the click of a button or an arbitrary setting. You have the options available to you, but you need to ensure that you test thoroughly to ensure that you use the right combination of settings for your job.