Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Trouble with tFileOutputDelimited

Hello.
- SCENARIO
[list=*]

  • I need to download files from an ftp server to local folder and then I need to delete theme

 

  • In the source ftp folder there is a continuosly uploading of files so I can't only do a "delete all files" job



I would solve in this way:
tFTPConnection -> tFTPFileList -> tFTPFileProperties -> tFTPOutputDelimited

In this way I do:
1) open the connection (OK)
2) read all the files name in the directory (OK)
3) take all the details from the files (OK)
4) write the list of files in cvs so with the next job I can download and delete without delete any new file (WRONG)

Everything work right: files' name are read, properties are fetched, csv is generated but inside there are only 3 blank rows without data (ftp files are 3).

How I can solve? Probably I need to create a SCHEMA to map tFTPFileProperties to something but I don't know what and where.

Any solution?

Thanks in advance
Tato

Labels (2)
6 Replies
vapukov
Master II
Master II

If You want go by Your way, You must convert tFTPList result into flow using

 ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE"))


or

 ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH"))




tFTPConnection -> tFTPList -> (iterate) -> convert variable to flow (tMap, tFixedFlowInput, tJavaFlex) -> tFileOutputDelimited


but really You can do all at same step

tFTPConnection -> tFTPList -> (iterate) -> tFTPGet - tFTPDelete
for  tFTPGet - tFTPDelete use  ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) as filename


it is must work as expected
Anonymous
Not applicable
Author

tFTPDelete delete ONLY the files listed in tFTPList or delete ALL files in directory?
I ask because the source folder is extremely dynamic and while download there is another job that continuosly upload to it then I need to be sure that deletes only the files already downloaded.
Your job job wouldn't be like follow?
[size=2][font=consolas, monaco, "bitstream vera sans mono", "courier new", courier, monospace]tFTPConnection -> [tFTPList -> (iterate) -> tFTPGet -> [/font][/size] (iterate) ->[size=2][font=consolas, monaco, "bitstream vera sans mono", "courier new", courier, monospace] tFTPDelete][/font][/size]
TKS
vapukov
Master II
Master II

tFTPDelete - delete only file which You set
so it must not be collisions
Anonymous
Not applicable
Author

Sorry but I can't understand: where I can set the files to be deleted?
My scenario:
[list=1]
  • a job upload to the folder 100 files each second (usually about 1kb each but could be bigger)

  • DownoladJob run every 10s

  • When job starts (after first 10s) the tFTPList fetches the name of 1.000 files

  • tFTPGet (linked to tFTPList by ITERATE) downloads all 1.000 files (I'm sure because Talend Studio show "1.000 execs finished" on the iterate link)

  • I linked tFTPGet to tFTPDelete with ITERATE link but on it is shown "1 exec finished"?


  • If the job take 1s to run, in the beginning there are 1.000 files but when tFTPDelete run there will be 1.100 files
    How I can be sure that will be deleted ONLY the 1.000 that were fetched in the beginning (and already downloaded)?

    Tks
    Anonymous
    Not applicable
    Author

    With my test tFTPDelete...I don't understand how it works!
    Scenario:
    [list=1]
  • Starting with 10 files already uploaded in source ftp folder

  • Run DeleteJob while uploading 125 files of different size (from 2M to 400M)

  • In the end of the job has been downloaded (and deleted) first 52 files (files were numbered, first 49 files were max 2M, 50th and 51st were 400M)

  • In the end were upload 135 files


  • Finally I think that with this component is not possible to know the exact status of the source when the job start to run.

    I will solve creating 2 jobs
    [list=1]
  • fetch source folder writing the list of files in csv file

  • read the csv file to know which files need to be downloaded


  • Bye

    Bye
    vapukov
    Master II
    Master II

    akatato wrote:
    Sorry but I can't understand: where I can set the files to be deleted?
    My scenario:
    [list=1]
  • a job upload to the folder 100 files each second (usually about 1kb each but could be bigger)

  • DownoladJob run every 10s

  • When job starts (after first 10s) the tFTPList fetches the name of 1.000 files

  • tFTPGet (linked to tFTPList by ITERATE) downloads all 1.000 files (I'm sure because Talend Studio show "1.000 execs finished" on the iterate link)

  • I linked tFTPGet to tFTPDelete with ITERATE link but on it is shown "1 exec finished"?


  • If the job take 1s to run, in the beginning there are 1.000 files but when tFTPDelete run there will be 1.100 files
    How I can be sure that will be deleted ONLY the 1.000 that were fetched in the beginning (and already downloaded)?

    Tks

    how? why You not make a test if not trust nobody? 0683p000009MACn.png
    it easy - run the Job, and compare result!
    tFTPDelete delete ONLY file with name which You are (I one more time tell You - YOU ARE 🙂 ) pass to him
    so, no any problems how many files come per second, and how often run download Job

    [list=*]
  • tFTPList - is only one element who read file name

  • tFTPGet - fetch filename received from tFTPList



  • [list=*]
  • tFTPDelete - delete filename received from tFTPList


  • and no matter - what happens - if tFTPList fetch new files - all will be downloaded and deleted, if tFTPList - not fetch file, not


    Potential collision could be only when - file created, but other Job not finish write information
    For avoid this - list files in date time order, oldest first. In this case - even if other Job will not finished write information when file name was fetched by tFTPList, it will be finished when all other files will be processed.


    P.S. and if You seriously worry about collision (it could be of course) - change logic from Pull to Push. switch from ftp to message queue, it resolve this collisions more accurate