Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Incremental Load

Hi,

I want to implement an Incremental load in Talend itself. I have gone through some blogs and posts which suggest creating a table in db to store the job name and last run time. I want a solution that handles the incremental loading by capturing the last update date in Talend itself and while executing the job next time it should pick the value and run for the newly updated records. Or any other approach which doesn't include creating a new table for this purpose.

 

Thanks in advance,

Sowmiya R

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

   Once you execute the data between last run time and current time (during first run), the next step is to capture the current time (I mean the end time of current run) back to the DB. 

 

   This will be the start date for the next month. In your example, 04-APR-2019 was the first time you ran the job. So after job execution, you need to capture this date to the DB. So when the next time in May you are running the job, it will fetch all the records between 04-APR-2019 and 01-May-2019.

 

The key part is to store the date of current run back to DB so that we you are executing next time, it will take the full delta range instead of current date.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

View solution in original post

21 Replies
manodwhb
Champion II
Champion II

@sowmi333 ,either you should capture the job name and last run time in database table or file, in talend we can cannot store and which can not able to pick that in the next run.

 

check the below link.

http://mahadevanrv.blogspot.in/2015/04/talend-incremental-load-using-talend.html

Anonymous
Not applicable
Author

Hi Sowmiya,

 

     The reason for recommending to store the control values in DB is because DBs have better capabilities to recover the data easily in case of a data loss or disaster. Instead of DB, you can store this last run data in a file also. In that case, the onus of maintaining and protecting this data will be on the operational team. What will happen in case the file is accidentally deleted by a user? Your whole process will go for a toss in that case.

 

    End of day, Talend is an ETL tool and it is not a storage medium. So trying to use Talend as the medium to store the control data is not the recommended approach. My personal preference is always to use DB for storing these type of data due to ts capabilities to come out of disaster recovery easily but you can also go by file approach.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

joearun4u
Creator
Creator

Hi Sowmiya,

 

If you don't want to create any additional table in DB, Every time when you execute the job , you can get max(LastUpdateddate) from the Target table and make it pass in to TDBInput query like 'select * from <table> where lastUpdateDate =<value from the previous component>.

 

Thanks,

Joe

 

<Give Kudos if you found this tip is useful>

Anonymous
Not applicable
Author

Hi Joe,
I have a doubt what you mean by the previous component in the phrase "TDBInput query like 'select * from <table> where lastUpdateDate =<value from the previous component>.". Can you please elaborate on it
Anonymous
Not applicable
Author

Hi,

 

    The data is extracted from the DB to a variable in the previous step by max(LastUpdateddate). This will be stored to a context variable (context.last_run).

 

    He meant to say, use this value from previous step when you are querying next time.

 

'select * from <table> where lastUpdateDate >= context.last_run'

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

 

 

Anonymous
Not applicable
Author

Hi @nthampi ,

Thanks much for your suggestion. Let me it try out and update you 🙂

 

Thanks,

Sowmiya R

joearun4u
Creator
Creator

Exactly, I meant the same step as you have explained.

 

but here the problem is, if the Target table is huge like 900 M records with 100's of columns, it will take more time to fetch the LastUpdatedDate . Always the better plan is to store the execution time in a Table for fast processing.

 

Thanks,

Joe

 

akumar2301
Specialist II
Specialist II

There is another way to get the last execution time OF JOB .one of the way is to use Metaservlet with below command :

 

----------------------------------------------------------
  Command: taskLog
----------------------------------------------------------
Description             : Retrieve the logs of the specified task
- lastExecution: true/false(default). Get the lastest execution log or not.
Requires authentication : true
Since                   : 5.1
Sample                  : 
{
  "actionName": "taskLog",
  "authPass": "admin",
  "authUser": "admin@company.com",
  "endTime": "\"2016-03-24 13:00:00\"",
  "lastExecution": false,
  "startTime": "\"2016-03-24 12:00:00\"",
  "taskId": 1
}
Specific error codes    : 
       220: Error happened when reading logs.
       225: Error happened when date range is illegal.
       226: Error appears when parsing the date.
--------------------------------------------------

 

Anonymous
Not applicable
Author

Hi @uganesh ,

 

Could you please elaborate on the Metaservlet as I'm new to Talend

 

Thanks,

Sowmiya