Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Need to execute a job based on a file, if file exists then incremental load or else if it is new then full load

I have a scenario, where for each record in a table,hash keys are generated separately and all the hash keys are stored as one delimited file and data as other. If new data is updated for the same table, then the hash keys for the new data are generated and they must be checked with the existing hash keys file, for this before we need to check if the executed hashkey file exists or not. Tried with tFileExists component but it is not working properly. How to put a condition to check the if the file exists or not.

 

 

 

Labels (1)
9 Replies
fdenis
Master
Master

We cannot work for you, we can help.
if you are very new to talend, start by a training, to understand what is an etl and how dose talend work.
you have to know how to use tool before making a plane.
take time to learn it's the best way.
Anonymous
Not applicable
Author

Any one need not work for me, Many people were kind enough to reply to the queries posted by the users.I am new to talend community,I thought , I might get proper solution for my issue, hence  for better understanding i explained the complete scenario. 

Anonymous
Not applicable
Author

@Moe 

 

You can verify whether a file is present or not using tFileExists component. Could you please refer the below sample scenario from the link given?

 

https://help.talend.com/reader/wDRBNUuxk629sNcI0dNYaA/g3tWtZMSyBup4eLk5HdOAg

 

   In the example, if the file does not exist, it is checking that condition and throwing error using message box. You can use similar condition to add a new if clause saying, if file is present, you can reroute the flow to your existing flow.

 

   Now there is another case where you may have to check multiple hash files in one go. For that you need to take the list of hash file names in another file or DB table and then convert the flow to iterate using tFlowtoIterate and that way you can do the same process for multiple files.

 

    Try it out and share the screen shots with full job flow and component details in case you are stuck anywhere. 

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

Anonymous
Not applicable
Author

@nthampi 

 

Thanks for your reply,

 

I have a problem, if a table is new then hash files are generated efficiently. But if the table is already executed once and hash file of that table is stored in the directory, then again  if new rows are added to the same table and if it is executed then, before i should perform a check if the table has hash file generated or not (the hash file name is stored with same name as table name) in a directory, so that if it has then only for new rows hash keys must be generated. The problem is the file checking must be dynamically and after finding the file , i should perform a incremental load. If the file exists it must be connected  to one tmap otherwise another. I am unable to accomplish this. It would be great help if you can specify any other component for this issue.


Capture3.PNG
Capture4.PNG
Anonymous
Not applicable
Author

Hi,

 

    Could you please try the tFileExists component as mentioned in the previous post? I am not able to see this component anywhere in your job flow.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

@nthampi 

 

From tDBinput suppose i am getting abc table , using the job i created 2 delimited files, abc_data.csv and abc.csv(file that has hash keys for each row). Now i needed to add few more rows in the existing abc table, again i am executing it, before execution i need to check if there is already hash file for that table or not (yes it is there) hence that hash file is given as lookup to tmap and only new data is generated separately as abc_new.csv(contains only updated data). I need to check if the table name coming from the database already has a delimited file(with the same name as table name) in the local directory or not. If i use tfileexists component then i will know only in static way. The main problem is i am not understanding how can i check if the coming table is executed once or not. I thought tfilexists might work but it is not working.


Capture11.PNG
Anonymous
Not applicable
Author

Hi,

 

    First of all, why you are checking the whole process of execution based on a file? It is bit risky since the file may get corrupt or vanish sometime. So if you want to track whether the execution of a flow to DB has taken place before, please add the execution details to a control table. Next time, go back and check in the control table, whether the process has happened. I believe that is far better way rather than checking a file.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Even I had  same scenario, where i need to check the hashkeys(hashfile.csv) of previous table and updated table hashkeys(these are generated in tmap) so that newly added rows have different hash keys which are  not present in (hashfile.csv) so those rejected rows are captured in different csv file(newdata.csv) and these updated rows hash keys are appended to (hashfile.csv).

 

here is a problem if the table is executing for first time there is no hashfile,then it should execute full Load job based on hashfile exists, if not incremental Load.(as i said before)

 

here i am unable to connect tfileinputdelimited to tmap to put lookup, please guide me suggestions i am new to talend.


Capture33.PNG
Anonymous
Not applicable
Author

Hi @supraja_sdk

 

    In your scenario, you are reading multiple files in the lookup section in iterative fashion. I would suggest to consolidate the lookup data into one file (in an earlier subjob) and then do the matching with main flow where you can try to do lookup with the file. But please note that your file will grow over a period of time. So make sure that you are storing only essential columns and enough memory needs to be supplied.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved