Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
My scenario is that every day source files will come to different dynamic date folders in S3. I need to pickup the files after last processed timestamp and get those files to use in the main flow job. I am using ts3list component to list the files with the prefix(I can't give complete path as the folder are with dynamic dates). After that in ts3get component, i have to get only files which are newer than the last processed timestamp. But not much options are available. I am able to provide only ts3list current key in the key section. With this i am getting older processed files as well. Also once i get proper file, i don't want to store in the local and process directly in the job. Please help me to achieve this scenario. Thanks!!
Sorry Manohar. With your approach, i am getting only one last file instead of 14 files. Below is the screenshot of that.
I think i found the solution. Below job did work and able to filter and get all the required files.
Now, i need to work on loading these files directly to a table without placing in local using s3select option in tS3Get. If anyone succeeded on s3select option, please let me know.
@gopal16 , you need to use tflowtoIterate pass the file by file to tS3Get.
Yes, after using tflowtointerate, it did work as mentioned previously. Now, i need help to read the files that i got from tS3Get and load all into a table. All files are having same structure. So, can be loaded into single table.
When i tried the below, it's throwing error : Duplicate nested type row2Struct. And when i re-compiling, it's throwing other error saying one of the field cannot be resolved or is not a field. Actually i am not using that field in tFileInputDelimited. That's defined in tIterateToFlow component only. If i am not having tFileInputDelimited and tDBOutput components in this job, it's running fine. Not sure where is the problem.
@gopal16 , til s3get no issue right,it is compilation issue you might have not configured correctly.
till ts3get no issues. When i added tfileinputdelimited and tdboutput components, then it's throwing this error. In tfileinputdelimited, in filename place i am giving the same filename that mentioned while getting it from ts3get. And then simply connected from tfileinputdelimited to tdboutput for loading those files. I didn't change any other configuration.
@gopal16 , you need to give the local filename along with in tfileinputdelimtted,do not use the global variabes of ts3list and check.
Yes, I am using the local folder only in the filename. I am giving the same filename in tinputfiledelimited that's mentioned dynamically in File option of tS3Get as i can't hardcode the filename.
@gopal16 , can you share screenshot of tfileinputdelimitted
@Manohar B : is there a way to get just the latest files from S3 if my client is placing all the files in same S3 bucket. File name changes every hour.