Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
_AnonymousUser
Specialist III
Specialist III

Synchronize files from unix to hdfs

I want to try and setup a job that would get a list of files from a remote server (Unix) and compare it with files listed in HDFS. If the file does not exist I want to get the file from the Unix server and put it into HDFS. Can anyone point me in the right starting direction on how I might be able to do this? I am using the latest 6.1 version of the Talend Big Data Studio.
Labels (2)
6 Replies
Anonymous
Not applicable

Hi 
What protocols do you want to access the remote server (Unix) and get the file? FTP? SCP or http? And do you just want to compare the file name or the file content?
Regards
Shong
_AnonymousUser
Specialist III
Specialist III
Author

I have access to scp and ftp. I just want to compare the file names since they are unique per day.
Anonymous
Not applicable

Hi
You can use txxxList to get all the file names from remote server and HDFS server, do an inner join between remote files and HDFS files and get the unmatched records, eg:
tFTPFileList--iterate--tFixedFlowInput1--main--tUnite--main--tMap--out1-->
                                                                                    |
                                                                               lookup
                                                                                    |
                      tHDFS--iterate--tFixedFlowinput2--main--tUnite-
tFixedFlowInput1: define one column and set its value as:
((String)globalMap.get("tFTPFileList_1_CURRENT_FILE"))
tFixedFlowInput2: define one column and set its value as:
((String)globalMap.get("tHDFSList_1_CURRENT_FILE"))
Refer to this KB article:
https://help.talend.com/pages/viewpage.action?pageId=190513450
Regards
Shong
Anonymous
Not applicable

Thank you for that!
I created the flow as you suggested. In this I am not sure the need for the unite element but I have them there anyways.

After this am I doing an FTPGet getting the file locally and then doing an HDFSPut? Is there anyway to orchestrate sending the file directly from the remote Unix server to HDFS?
Anonymous
Not applicable

Hi 
tUnite component is needed in this job to merge all the file name before doing the join. 
After this am I doing an FTPGet getting the file locally and then doing an HDFSPut? Is there anyway to orchestrate sending the file directly from the remote Unix server to HDFS?

No a direct way to move the file between remote server and HDFS, you have to get it to local system and then put it to HDFS.
Regards
Shong
Anonymous
Not applicable

Thanks for all of your help!