Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
BipinNS
Contributor
Contributor

Understading talend Modules

Wanted to understand talend modules.

In talend, while using tLibraryLoad or while managing library for routines, can see options to add jar wherein, its either already available under platform list or we can select 3rd party jar and install that as a module. Then classes from these modules can be used within the job. Snap Below:

0695b00000UyP8yAAF.pngWanted to understand below regarding the talend modules:

  • Are talend modules references to library jar files within talend?
  • What are the ones listed within Platform? Are these the ones that are included within Talend Studio default package and anything additional needs to be installed as new module ?
  • in a routine, if we add library from platform. Then when we upgrade talend version, upgrading the library referenced from the routine - is that handled similarly how, its handled for talend compoenents ( wherein it asks to use new verisions of jars )

 

I'm trying to use an external jar for one of my routine. But once this goes to Production, in a long run, when we upgrade talend verision or java version, will we explicitly have to remember to upgrade the external jar referenced in the routine, or since its under Platform/ or installed as module in talend, talend studio will notify on this when upgrading talend version.

could someone please advise on above.

 

 

 

 

 

 

 

 

 

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Hello,

 

I'm not sure how zip4j versioning works. Maybe 1.x and 2.x can coexist just like log4j and slf4j. Lets have a few hypotetical scenarios:

  • Jar A and Jar B provide totally different classes & functions -> no problem
  • Jar A and B provide the same classes but different functions ( doThing(String) doThingImproved(String) )in this case we're heavily depend on the JVM itself. If we use Jar B that provides both functions we're good. If we have lib/JarA;lib/JarB on the classpath (which is kind of random between the jobs/local/remote executions but consistent for the builds/runs afterwards) in that case the newer Jar and the extra functions won't be present. In this case what you can do is tRunJob + Use separate process and try to avoid tFileArchive in those jobs running in a separate JVM. And using the routine libs it should be possible to automatically include those dependencies only for the jobs running in a separate JVM.

 

I personally would try to align myself to the libs that Talend provides. (Even if this means adding some components that I never trigger) By staying on the latest patch one would automatically benefit from security upgrades. Even if this means that time to time my routine has to be adjusted.

 

Maybe via having dependabot checking the routines pom.xml or setting up a security scan on the final build inside CICD one could be notified about risky jars. Our jenkins pipeline can be easily expanded: https://help.talend.com/r/en-US/8.0/software-dev-lifecycle-best-practices-guide/ci-jenkins

 

hope this helps

View solution in original post

6 Replies
Anonymous
Not applicable

Hello,

 

I'm glad you asked this question. The modules available in Talend can be viewed via the Moduels view: https://help.talend.com/r/en-US/Cloud/studio-user-guide-api-services-platform/installing-external-modules-manually-using-modules-view

 

For the Routine dependencies you have 2 options.

  • You maintain these dependencies by yourself, and follow the security upgrades.
  • You remove the "Required" checkbox from the routine.

How does this Required checkbox work?

When the Required checkbox is enabled that means the dependency will be included to the final build. If it's unchecked then it will only be used to compile the routine. (Routines are complied separately and before the jobs built.) So how could one use such?

Lets say your routine is using AWS secret manager to retrieve secrets. As the S3 component brings the aws-java-sdk-bundle one could develop the routine by specifying the aws-java-sdk platform jar. If the Required checkbox is enabled then in the beginning there's no visible difference.

Lets fast forward 12 months. Talend S3 component now uses a newer jar. This jar might be only a security fix or might be conflicting with the one required by the routine. If the Required checkbox is unchecked that would mean your final job build will include the jar that as upgraded by the Talend patch.

 

Another option is to use the custom mvn uri mapping and override the MVN versions before the pom generation / build. https://help.talend.com/r/en-US/Cloud/studio-user-guide-api-services-platform/overriding-external-modules-by-customizing-mvn-uri

 

If your jar is not used by any of the components then you're left with no choice but to maintain your jar manually. (Detect when it has new versions / security fixes, and take appropriate action.)

 

Hope this helps.

 

Regards,

Balázs

BipinNS
Contributor
Contributor
Author

Hi Balazs, 

 

 

Great, this is what I wanted to understand. 

 

Follow up question on the details you provided for better clarification: 

  • Suppose, in my routine I'm using a platform jar zip4j version 1.3.3. Now with talend upgrade, the tFileArchive component now starts to use newer version say 2.11.0. Now in the routine if I had set required as unchecked, then the routine will still be compiled using zip4j version 1.3.3 but the dependency jar won't be included in the job build. However, the build will only include zip4j - 2.11.0 ( used by the component). Is this correct ?
    • if thats the case, then will the routine work properly as it will still look for zip4j 1.3.3 when it runs right?
  • if I have to include an external jar not available as talend platform jar, then could you please advise whats the best way to maintain the dependency, is there a standard way to upgrade/maintain the dependency with time.
Anonymous
Not applicable

Yes, it is correct. Hopefully the 2.11 is backward compatible with 1.3.3

Routine will look for the class from the classpath which depends on the jars your JVM loads. (which can be a problem when you have multiple jars providing the same class as in that case if the 1.3 is loaded sooner than the CVE / performance fixes of 2.x won't be visible.

 

For this external jar use-case I'd suggest to set up a reference project, and maintain it there. That way you'll be able to manage it centrally. And the rest of your projects will inherit it. We have custom routine jars that will generate a dedicated jar file instead of altering routines.jar, see: https://help.talend.com/r/en-US/Cloud/studio-user-guide-api-services-platform/creating-custom-routine-jars

BipinNS
Contributor
Contributor
Author

Thanks for the explanation makes sense.

 

Now, in my case, zip4j version 1.3.3 needs to be used in a separate job, that job will only include class path to this jar version. There are other jobs which will use tFileArchive and those won't use the custom routine, so will only include the platform jar in the class path. With this, if in the future say Talend 8 if it uses zip4j - 2.10.0 , then that will only affect the jobs using the tFileArchive right, and the job which uses the routine could still use zip4j 1.3.3 ( keeping the required checked in routine library config ) - I'm writing this because for zip4j all versions > 2.0.0 is a major rewrite so the code using 1.3.3 won't work with > 2.0.0 without the changes in the routine. Is there anything that needs to be considered with this approach.

 

On the central management using referenced project and custom routine jar, this looks nice. Thanks for this.

Is there a way to update the maven setting in talend projects, so that it would notify availability of newer versions of dependent jar, so that I don't have to keep track of all the external dependencies outside the platform jars.

 

 

 

 

Anonymous
Not applicable

Hello,

 

I'm not sure how zip4j versioning works. Maybe 1.x and 2.x can coexist just like log4j and slf4j. Lets have a few hypotetical scenarios:

  • Jar A and Jar B provide totally different classes & functions -> no problem
  • Jar A and B provide the same classes but different functions ( doThing(String) doThingImproved(String) )in this case we're heavily depend on the JVM itself. If we use Jar B that provides both functions we're good. If we have lib/JarA;lib/JarB on the classpath (which is kind of random between the jobs/local/remote executions but consistent for the builds/runs afterwards) in that case the newer Jar and the extra functions won't be present. In this case what you can do is tRunJob + Use separate process and try to avoid tFileArchive in those jobs running in a separate JVM. And using the routine libs it should be possible to automatically include those dependencies only for the jobs running in a separate JVM.

 

I personally would try to align myself to the libs that Talend provides. (Even if this means adding some components that I never trigger) By staying on the latest patch one would automatically benefit from security upgrades. Even if this means that time to time my routine has to be adjusted.

 

Maybe via having dependabot checking the routines pom.xml or setting up a security scan on the final build inside CICD one could be notified about risky jars. Our jenkins pipeline can be easily expanded: https://help.talend.com/r/en-US/8.0/software-dev-lifecycle-best-practices-guide/ci-jenkins

 

hope this helps

BipinNS
Contributor
Contributor
Author

Thanks a lot, really helpful.

 

We're also trying to build a CI pipeline with Jenkins, probably will get back on that if need any assistance.