How to get filename from Folder containing multipl... - Qlik Community

talenddev1 · ‎2019-06-28

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Here are the details

All the files in the directory ("C:\data\product") having similar schema.
I can extract the data from 3 files and output to delimited files, but I cannot extract "filenames"(product_Jan.txt, product_Feb.txt, product_Mar.txt) from the directory("C:\data\product") and output the file names to a delimited file.

It can be achieved in DI using tfilelist component and ((String)globalMap.get("tFileList_1_CURRENT_FILE")) but I need to achieve this in Talend spark big data batch job.

Please share some suggestions to achieve this in Talend spark big data batch jobs

Please find the attachment.

Anonymous · ‎2019-07-02

Hi,

tFileList is a file orchestration component and it is available only in Standard jobs. There are no Spark specific activities which will be doing for this component. So the component was rightly placed in the Standard job.

So if you want to do the processing where you need to pass the file names as parameters, you will have to use parent-child relationship where you can call the BD job as an independent child job from standard job. Or you will have to orchestrate both jobs in such a way that BD job will be called multiple times through scheduler based on the number files (where file name will be passed as parameter).

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

talenddev1 · ‎2019-07-07

Hi Nikhil,

Thanks for your reply.

In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.

Thanks

SparkJOB_ProductFIles.JPG

talenddev1 · ‎2019-07-15

Hi Nikhil,

Thanks for your reply.

In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.

Thanks

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Big Data

Studio

v7.x