Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have a scenarios wherein tFileArchive component is being used to add 20-30K plus files and create a zip package. The size of the package would range from 20 - 30 GB. However, tFileArchive takes quite long time ~ 2hrs to achieve this. Tried using faster setup for the component but still not much improvement.
I tried creating a Routine using zip4j java package. Creating zip using zip4j, the archiving time got reduced drastically for the same volume. The gain is more than 50%. I'm able to create the zip in less than hour for the above mentioned volume.
My question is, is it fine/reliable to use
zip4j - in a routine to achieve this - or are there anything that needs to be considered that would create an issue down the line. Or, is it possible to speed up using any talend component itself.
Hello,
According to this online component: TalendHelpCenter: tfilearchive, here is a compress level option, which provides user with 3 levels you want to apply.
Could you just indicate if you can use this option?
If the file size or the total size of the archive exceeds 4GB or there are more than 65536 files inside the archive, you need to set the ZIP64 mode to ALWAYS.
Sometimes transformation bottlenecks happen because of a large monolithic job that tries to do many things at once. Break down such large jobs into smaller jobs that are more efficient for data processing.
Feel free to post your issue here.
Best regards
Sabrina
Thanks for the info.
Getting back on this after a while.
I can use the options you've mentioned above Fast compression and using zip64 always. I've already tried that - certainly speeds up the process to some extent.
However, still for my scenario the overall time taken to create the zip is significantly higher 1 hour plus using the component. Also could you please confirm on below:
Is there a way to add files for creating a zip in talend ?
Wanted to know what library is talend component tFileArchive using underneath, is that zip4j or something else. Could use the same library and create my own routine. Let me know if I need to consider anything while doing this ( for upgrades/maintaining the code )
Could you please advise on above, would be helpful.
Hello,
Talend tFileArchive is also using zip4j library underneath.
There should be zip4j-1.3.3.jar when I searched zip4j on mvn repository.
Best regards
Sabrina
Hi Sabrina,
Thanks for the response.
- Suppose, in my routine I'm using a platform jar zip4j version 1.3.3. Now with talend upgrade, the tFileArchive component now starts to use newer version say 2.11.0. Now in the routine if I had set required as unchecked, then the routine will still be compiled using zip4j version 1.3.3 but the dependency jar won't be included in the job build. However, the build will only include zip4j - 2.11.0 ( used by the component). Is this correct ?. Or since the platform jar got updated to newer version, and the routine was refering to the platform jar, It'll also use the newer version/ or fail to compile and we'll have to update the library to use the new jar? Could you please advise.
-if I have to include an external jar not available as talend platform jar, then could you please advise whats the best way to maintain the dependency, is there a standard way to upgrade/maintain the dependency with time.