Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I want to implement an Incremental load in Talend itself. I have gone through some blogs and posts which suggest creating a table in db to store the job name and last run time. I want a solution that handles the incremental loading by capturing the last update date in Talend itself and while executing the job next time it should pick the value and run for the newly updated records. Or any other approach which doesn't include creating a new table for this purpose.
Thanks in advance,
Sowmiya R
Hi,
Once you execute the data between last run time and current time (during first run), the next step is to capture the current time (I mean the end time of current run) back to the DB.
This will be the start date for the next month. In your example, 04-APR-2019 was the first time you ran the job. So after job execution, you need to capture this date to the DB. So when the next time in May you are running the job, it will fetch all the records between 04-APR-2019 and 01-May-2019.
The key part is to store the date of current run back to DB so that we you are executing next time, it will take the full delta range instead of current date.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
@sowmi333 ,either you should capture the job name and last run time in database table or file, in talend we can cannot store and which can not able to pick that in the next run.
check the below link.
http://mahadevanrv.blogspot.in/2015/04/talend-incremental-load-using-talend.html
Hi Sowmiya,
The reason for recommending to store the control values in DB is because DBs have better capabilities to recover the data easily in case of a data loss or disaster. Instead of DB, you can store this last run data in a file also. In that case, the onus of maintaining and protecting this data will be on the operational team. What will happen in case the file is accidentally deleted by a user? Your whole process will go for a toss in that case.
End of day, Talend is an ETL tool and it is not a storage medium. So trying to use Talend as the medium to store the control data is not the recommended approach. My personal preference is always to use DB for storing these type of data due to ts capabilities to come out of disaster recovery easily but you can also go by file approach.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi Sowmiya,
If you don't want to create any additional table in DB, Every time when you execute the job , you can get max(LastUpdateddate) from the Target table and make it pass in to TDBInput query like 'select * from <table> where lastUpdateDate =<value from the previous component>.
Thanks,
Joe
<Give Kudos if you found this tip is useful>
Hi,
The data is extracted from the DB to a variable in the previous step by max(LastUpdateddate). This will be stored to a context variable (context.last_run).
He meant to say, use this value from previous step when you are querying next time.
'select * from <table> where lastUpdateDate >= context.last_run'
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Exactly, I meant the same step as you have explained.
but here the problem is, if the Target table is huge like 900 M records with 100's of columns, it will take more time to fetch the LastUpdatedDate . Always the better plan is to store the execution time in a Table for fast processing.
Thanks,
Joe
There is another way to get the last execution time OF JOB .one of the way is to use Metaservlet with below command :
----------------------------------------------------------
  Command: taskLog
----------------------------------------------------------
Description             : Retrieve the logs of the specified task
- lastExecution: true/false(default). Get the lastest execution log or not.
Requires authentication : true
Since                   : 5.1
Sample                  : 
{
  "actionName": "taskLog",
  "authPass": "admin",
  "authUser": "admin@company.com",
  "endTime": "\"2016-03-24 13:00:00\"",
  "lastExecution": false,
  "startTime": "\"2016-03-24 12:00:00\"",
  "taskId": 1
}
Specific error codes    : 
       220: Error happened when reading logs.
       225: Error happened when date range is illegal.
       226: Error appears when parsing the date.
--------------------------------------------------