Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
talenddev1
Contributor

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Here are the details

  1. All the files in the directory ("C:\data\product") having similar schema.
  2. I can extract the data from 3 files and output to delimited files, but I cannot extract "filenames"(product_Jan.txt, product_Feb.txt,  product_Mar.txt) from the directory("C:\data\product") and output the file names to a delimited file.

It can be achieved in DI using tfilelist component and ((String)globalMap.get("tFileList_1_CURRENT_FILE")) but I need to achieve this in Talend spark big data batch job.

Please share some suggestions to achieve this in Talend spark big data batch jobs

Please find the attachment.

Labels (3)
3 Replies
Anonymous
Not applicable

Hi,

 

    tFileList is a file orchestration component and it is available only in Standard jobs. There are no Spark specific activities which will be doing for this component. So the component was rightly placed in the Standard job.

 

    So if you want to do the processing where you need to pass the file names as parameters, you will have to use parent-child relationship where you can call the BD job as an independent child job from standard job. Or you will have to orchestrate both jobs in such a way that BD job will be called multiple times through scheduler based on the number files (where file name will be passed as parameter).

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

talenddev1
Contributor
Author

Hi Nikhil,

 

Thanks for your reply.

 

In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.

 

Thanks


SparkJOB_ProductFIles.JPG
talenddev1
Contributor
Author

Hi Nikhil,



Thanks for your reply.



In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.



Thanks