
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to get filename from Folder containing multiple files in a directory in Talend big data Spark job
How to get filename from Folder containing multiple files in a directory in Talend big data Spark job
Here are the details
- All the files in the directory ("C:\data\product") having similar schema.
- I can extract the data from 3 files and output to delimited files, but I cannot extract "filenames"(product_Jan.txt, product_Feb.txt, product_Mar.txt) from the directory("C:\data\product") and output the file names to a delimited file.
It can be achieved in DI using tfilelist component and ((String)globalMap.get("tFileList_1_CURRENT_FILE")) but I need to achieve this in Talend spark big data batch job.
Please share some suggestions to achieve this in Talend spark big data batch jobs
Please find the attachment.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
tFileList is a file orchestration component and it is available only in Standard jobs. There are no Spark specific activities which will be doing for this component. So the component was rightly placed in the Standard job.
So if you want to do the processing where you need to pass the file names as parameters, you will have to use parent-child relationship where you can call the BD job as an independent child job from standard job. Or you will have to orchestrate both jobs in such a way that BD job will be called multiple times through scheduler based on the number files (where file name will be passed as parameter).
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nikhil,
Thanks for your reply.
In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.
I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.
Please find the attached screenshot.
Thanks
SparkJOB_ProductFIles.JPG

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.
I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.
Please find the attached screenshot.
Thanks
