Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

tHDFSGet all files in directory

Hi,

 

Is it possible to get all files in HDFS directory and save it as a single file / multiple files on a local machine?

I've want to extract files from a directory with regular expressions but it doesn't seem to work. But it says on the documentation here (https://help.talend.com/reader/g8zdjVE7fWNUh3u4ztO6Dw/PUKLf_wAqRMmwe4w~Lw1wA) that regular expressions is supported in filemasks.

 

I'm basically trying to grab files that match: ".+part-.*" inside a directory (iterating through subdirectories).

These files are the output from the tFileOutputDelimited from a Spark Streaming job.

 

Thank you.

 

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet

View solution in original post

2 Replies
Anonymous
Not applicable
Author

Have you tried tHDFSList? You can specify a filemask (glob or regex) with this and iterate through files / directories / subdirectories of a specific hdfs location.  You could then pass the global variable 

((String)globalMap.get("tHDFSList_1_CURRENT_FILEPATH"))

to the "HDFS directory" property of tHDFSGet

Anonymous
Not applicable
Author

Thank you! I happened to stumble across an example at the bottom part of the documentation too. Didn't know that the autocomplete also works on the component fields.