Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
gopal16
Contributor
Contributor

Retrieve the selected files from S3 bucket and process those in the job directly

My scenario is that every day source files will come to different dynamic date folders in S3. I need to pickup the files after last processed timestamp and get those files to use in the main flow job. I am using ts3list component to list the files with the prefix(I can't give complete path as the folder are with dynamic dates). After that in ts3get component, i have to get only files which are newer than the last processed timestamp. But not much options are available. I am able to provide only ts3list current key in the key section. With this i am getting older processed files as well. Also once i get proper file, i don't want to store in the local and process directly in the job. Please help me to achieve this scenario.  Thanks!!

Labels (3)
19 Replies
gopal16
Contributor
Contributor
Author

Sorry Manohar. With your approach, i am getting only one last file instead of 14 files. Below is the screenshot of that.

 

0683p000009MaFd.jpg

 

 

 

gopal16
Contributor
Contributor
Author

I think i found the solution. Below job did work and able to filter and get all the required files.

 

0683p000009MaFi.jpg

 

Now, i need to work on loading these files directly to a table without placing in local using s3select option in tS3Get. If anyone succeeded on s3select option, please let me know.

manodwhb
Champion II
Champion II

@gopal16 , you need to use tflowtoIterate pass the file by file to tS3Get.

gopal16
Contributor
Contributor
Author

Yes, after using tflowtointerate, it did work as mentioned previously. Now, i need help to read the files that i got from tS3Get and load all into a table. All files are having same structure. So, can be loaded into single table. 

 

When i tried the below, it's throwing error : Duplicate nested type row2Struct. And when i re-compiling, it's throwing other error saying one of the field cannot be resolved or is not a field. Actually i am not using that field in tFileInputDelimited.  That's defined in tIterateToFlow component only. If i am not having tFileInputDelimited and tDBOutput components in this job, it's running fine. Not sure where is the problem.

 

0683p000009MZrN.jpg

 

0683p000009MaFe.jpg

 

 

manodwhb
Champion II
Champion II

@gopal16 , til s3get no issue right,it is compilation issue you might have not configured correctly.

gopal16
Contributor
Contributor
Author

till ts3get no issues. When i added tfileinputdelimited and tdboutput components, then it's throwing this error. In tfileinputdelimited, in filename place i am giving the same filename that mentioned while getting it from ts3get. And then simply connected from tfileinputdelimited to tdboutput for loading those files. I didn't change any other configuration.

manodwhb
Champion II
Champion II

@gopal16 , you need to give the local filename along with in tfileinputdelimtted,do not use the global variabes of ts3list and check.

gopal16
Contributor
Contributor
Author

Yes, I am using the local folder only in the filename. I am giving the same filename in tinputfiledelimited that's mentioned dynamically in File option of tS3Get as i can't hardcode the filename. 

manodwhb
Champion II
Champion II

@gopal16 , can you share screenshot of tfileinputdelimitted

sushantk19
Creator
Creator

@Manohar B​ : is there a way to get just the latest files from S3 if my client is placing all the files in same S3 bucket. File name changes every hour.