Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Unarchive each single file at a time

Hi all,
I have some 15-20 zip files in one folder, as a part of my job i have to UN-archive each zip file and process the extracted .tsv file and load it in to the oracle table and after successfully loading the processed data in to the table all the extracted .tsv files should be deleted and the parent zip file which is extracted and processed recently should be moved to another folder(Say for example 0683p000009MAB6.pngrocessed folder in the same directory), then the next zip file should be extracted and again the same process should be continued untill all the zip files in the directory are completed,processed and moved to another folder.
Right now i am running my job 15 times for each file which is pretty time taking, Can any one tell me how to loop the entire process for 15 times

For example let us say i have 15 zipped files in my C:/Test directory with names as below
1. file1_2012-04-06
2. file2_2012-04-06
.
.
.
15. file15_2012-04-06

Right now i am using the below job to run 15 times for each single file at a time for each single file
tFileUnarchive --------> tOracleBulkExec ---------> tFileCopy --------> tFileDelete

But running this 15 times is not some thing a professional way of doing, so any one help me to do the above entire process in a single stretch. Please tell me the settings that i need to configure and a sample job if you guys have any
Thanks and Regards,
Pavan
Labels (2)
15 Replies
Anonymous
Not applicable
Author

Try to use tWaitForFile component and link it to your job/subjob with Iterate. Set 'Trigger action when' to 'a file is created' and 'Then' -> Continue loop. If you want to run the main job using tRunJob you can load the file name to a context variable and transmit to the child job.
Anonymous
Not applicable
Author

Hi Kelebek,
I have a few questions here
1. Why should i use tWaitForFile? I have all the files placed in one folder, i didn't get why i should use tWaitForFile
2. What should i run in tRunJob? i mean to ask what process should be executed in main job?
3. Can you give any image of how to pass file name as context variable and how to use that and in which component should it be used?
Thanks you for the information you provided, it was a bit helpful, but i could not understand it clearly. Can you explain or show in more clear way?
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

1. tWaitForFile works in a loop. If you set the 'Max number of iterations' to 15 it will be executed 15 times and will pick your files one by one. In the component configuration you set the file mask which could be '*.txt' and then it will pick all your files with this extension.
2. In the main job is everything you implemented so far (unarchiving, loading, copying, deleteing).
3. The screen shot of the tMap_2 from the picture in my previous post attached. On the left is standard tWaitForFile schema. On the right is standard tContextLoad schema. 'FileName' is a context variable that you have to create in your jobs. When this is done you have to tick the 'Transmit whole context' box in the tRunJob component. And that's it, now you can use the variable in your source file name as 'context.FileName'.
Cheers
Anonymous
Not applicable
Author

If you wanted to, you could also automate the number of the tWaitForFile executions.
I would do it this way:
1. Use tSystem component to run a UNIX/Windows command to count the number of files in your folder and write it to a file.
2. Create a context variable to store the number of files (lets say 'FilesCount').
3. Use the file generated in step 1 as a source and load the number to the variable created in step 2 using tLoadContext.
4. Use the variable in the tWaitForFile component as the 'Max. number of iterations': Integer.parseInt(context.FilesCount)
And your process is fully automated.
Anonymous
Not applicable
Author

Hi Kelebek,
You idea is mind blowing, can you show me how we can use tSystem component and what code should be written in that component, i am very poor in coding part. Can you give the images of the below mentioned process in a step by step process and the settings that need to be configured in each component. As i am learning, i am sure one day i will excel with guidance of you all people. You people are doing great job for a novice user like me.
Waiting for your reply!
Thanks and Regards,
Pavan

If you wanted to, you could also automate the number of the tWaitForFile executions.
I would do it this way:
1. Use tSystem component to run a UNIX/Windows command to count the number of files in your folder and write it to a file.
2. Create a context variable to store the number of files (lets say 'FilesCount').
3. Use the file generated in step 1 as a source and load the number to the variable created in step 2 using tLoadContext.
4. Use the variable in the tWaitForFile component as the 'Max. number of iterations': Integer.parseInt(context.FilesCount)
And your process is fully automated.
janhess
Creator II
Creator II

Easier to use tFileList to get all the files and iterate in each of them.
Anonymous
Not applicable
Author

You'r right janhess, I didn't think of this! It is much, much easier...
Anonymous
Not applicable
Author

Hi Kelebek,
I had put the images of my jobs i have done based on your ideas given for the requirement which i had asked. I had done a job called Sample_File_Copy_1 which will do all my (UN-archiving, Loading, Copying and Deleting process) for each single file. You advised me to define contexts and use the contexts in the particular components, the problem occurs here for me, i am totally confused here, how to define contexts and where to configure the defined context i mean in which component. I had put some images of the context group i had created and which i used in my Sample_File_Copy_1. I have few questions here,
1. How can we define file name as context?Because i will be having some 15 zip files in same directory with different names, How can i get the name of each file name passed as context at each iteration?
2. Will the file name passed as a context will be the file name of the zip file that is being UN-archived in my tUnarchive component which i am using in my job?
3.And after i copy the first processed file, I am deleting the file, i have a problem here which i forgot to mention earlier, here each time a zip file is UN-archived it extracts some 8-10 *.tsv files, so after copying the zip file to a different directory i should delete the Zip file including the all *.tsv files which got extracted from the particular processed zip file.How can this be accomplished, any ideas please?
Kindly put the images of the job and configuration properties that should be configured in a step by step process, Please kindly bare with me.
Sorry for troubling you.
Thanks and Regards,
Pavan
Anonymous
Not applicable
Author

Hi all,
Any Ideas/Suggestions?
Kindly help me out, i am in deadly need to complete the above mentioned using talend.
Thanks and Regards,
Pavan