Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
BipinNS
Contributor
Contributor

tFileArchive component taking long time when zipping large volume of files

We have a scenarios wherein tFileArchive component is being used to add 20-30K plus files and create a zip package. The size of the package would range from 20 - 30 GB. However, tFileArchive takes quite long time ~ 2hrs to achieve this. Tried using faster setup for the component but still not much improvement.

I tried creating a Routine using zip4j java package. Creating zip using zip4j, the archiving time got reduced drastically for the same volume. The gain is more than 50%. I'm able to create the zip in less than hour for the above mentioned volume.

My question is, is it fine/reliable to use

zip4j - in a routine to achieve this - or are there anything that needs to be considered that would create an issue down the line. Or, is it possible to speed up using any talend component itself.

Labels (3)
4 Replies
Anonymous
Not applicable

Hello,

According to this online component: TalendHelpCenter: tfilearchive, here is a compress level option, which provides user with 3 levels you want to apply.

  • Best: the compression quality will be optimum, but the compression time will be long.
  • Normal: the compression quality and time will be average.
  • Fast (no compression): the compression will be fast, but the quality will be lower.

Could you just indicate if you can use this option?

If the file size or the total size of the archive exceeds 4GB or there are more than 65536 files inside the archive, you need to set the ZIP64 mode to ALWAYS.

Sometimes transformation bottlenecks happen because of a large monolithic job that tries to do many things at once. Break down such large jobs into smaller jobs that are more efficient for data processing.

Feel free to post your issue here.

Best regards

Sabrina

 

 

BipinNS
Contributor
Contributor
Author

Thanks for the info.

Getting back on this after a while.

I can use the options you've mentioned above Fast compression and using zip64 always. I've already tried that - certainly speeds up the process to some extent.

However, still for my scenario the overall time taken to create the zip is significantly higher 1 hour plus using the component. Also could you please confirm on below:

  • for my usecase, I need to create zip for selective files from a folder, however, tFileArchive only gives option to match files based on a pattern. What I need is to add specific files ( pattern doen't help) and create the zip. Since, there is no such option, currently we're idenfying the files to be added to the zip, then copying those files to another folder and then we're zipping all files in that folder via tFileArchive. So, copying the files is consuming additional time.

Is there a way to add files for creating a zip in talend ?

  • when I use zip4j library - and use a routine to achieve this, the whole process happens in half the time compared to using the talend archive component.

Wanted to know what library is talend component tFileArchive using underneath, is that zip4j or something else. Could use the same library and create my own routine. Let me know if I need to consider anything while doing this ( for upgrades/maintaining the code )

 

Could you please advise on above, would be helpful.

Anonymous
Not applicable

Hello,

Talend tFileArchive is also using zip4j library underneath.

There should be zip4j-1.3.3.jar when I searched zip4j on mvn repository.

Best regards

Sabrina

 

 

BipinNS
Contributor
Contributor
Author

Hi Sabrina,

 

Thanks for the response.

 

- Suppose, in my routine I'm using a platform jar zip4j version 1.3.3. Now with talend upgrade, the tFileArchive component now starts to use newer version say 2.11.0. Now in the routine if I had set required as unchecked, then the routine will still be compiled using zip4j version 1.3.3 but the dependency jar won't be included in the job build. However, the build will only include zip4j - 2.11.0 ( used by the component). Is this correct ?. Or since the platform jar got updated to newer version, and the routine was refering to the platform jar, It'll also use the newer version/ or fail to compile and we'll have to update the library to use the new jar? Could you please advise.

 

-if I have to include an external jar not available as talend platform jar, then could you please advise whats the best way to maintain the dependency, is there a standard way to upgrade/maintain the dependency with time.