Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

TFileInputDelimited for Big Data dpark cannot read mutiple gz files

Hi, I'm using the Talend Big Data studio Enterprise edition and need to read (extract) multiple gz files and then apply transformations on them.

On normal DI I used tFileUnarchive for this purpose but it's not present in Spark Big Data.

I know that tFileInputDelimited for big data can read gz files by default but I've yet to find a way to allow it to take multiple files as input

My files are in the format

File1-00001.out.gz
File1-00002.out.gz
.
.
.
File1-0075.out.gz
File2-00001.out.gz
File2-00002.out.gz
.
.
.
.
File2-00075.out.gz
Labels (1)
3 Replies
Anonymous
Not applicable
Author

Please, what do you mean by spark big data?

0683p000009M5MT.png

Anonymous
Not applicable
Author

Sorry for sounding so vague, I'm new to Talend.

I'm talking about the Big Data Batch jobs which run on Spark framework. The Standard Batch jobs has components like tFileList and tFileUnarchive but a Big Data Batch job doesn't.
Anonymous
Not applicable
Author

Hi,

 

    You will have to do an orchestration using DI job and BD job to solve this problem. Why don't you try to give each gz file as parameter to a BD job where it will perform the balance steps. The parent job which will send one file at time can be a normal DI job.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂