Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I want to implement an Incremental load in Talend itself. I have gone through some blogs and posts which suggest creating a table in db to store the job name and last run time. I want a solution that handles the incremental loading by capturing the last update date in Talend itself and while executing the job next time it should pick the value and run for the newly updated records. Or any other approach which doesn't include creating a new table for this purpose.
Thanks in advance,
Sowmiya R
Hi @joearun4u ,
Yes, as you said I'm having over a million records and When I executed as suggested by you and @nthampi I found that new records are getting inserted but I guess if we are running it daily then the approach will work but if we want to run once a month or something like that for eg : last run date is 04-APR-2019 again I will run on 04-MAY-2019 in this case If I take MAX(LAST_UPDATE_DATE) then it takes only the records from the date 04-MAY-2019 leaving behind the records for one month. Did I miss anything Kindly guide further
Thanks,
Sowmiya R
Hi,
Once you execute the data between last run time and current time (during first run), the next step is to capture the current time (I mean the end time of current run) back to the DB.
This will be the start date for the next month. In your example, 04-APR-2019 was the first time you ran the job. So after job execution, you need to capture this date to the DB. So when the next time in May you are running the job, it will fetch all the records between 04-APR-2019 and 01-May-2019.
The key part is to store the date of current run back to DB so that we you are executing next time, it will take the full delta range instead of current date.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi @nthampi ,
I have done this approach already and working fine but the customer doesn't want to create a new table in DB and want the Talend itself to capture the last run time and pick up the time while executing the next time. SO I'm looking for any other alternative approach that Talend alone can handle
Thanks,
Sowmiya R
Hi Sowmiya,
Definitely It will work..
You have to capture the Max(lastUpdateDate) along with time for eg, 04-Mar-2019 12:37:08. Then you have to pass the value to input query as 'select * from <SourceTableName> where lastUpdatedate > '04-Mar-2019 12:37:08'. So it will fetch all the records which is greater than the given date '04-Mar-2019 12:37:08'.
Thanks,
Joe
<Don't forget to give kudos>
Hi Sowmiya,
Could you please share what is the compelling reason shared by customer for not using the DB approach? It is quite an established approach used by many enterprise customers throughout the globe.
So I am really interested to know why the customer is afraid to have a configuration table to handle this data. If customer is looking to store the information inside Talend, we need to educate the customer that Talend is an ETL tool and not a data storage medium.
Our Talend community members can definitely shed some light in this area if customer would like to discuss on this part in Talend Community through this post itself.
Apart from that, I believe the original question for this post is already covered by the posts. If you are happy with the details provided for using DB in this scenario, could you please mark the topic as resolved by marking all the posts which helped you to arrive at the solution?
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
hi sowmya,
can you share the incremental job to me
Hi,
Can you help me create incremental job. I am new to Talend.
I am attaching the context variable defined by me at staging level. we have 2 fields updated_at(coming from source system) and Load_at (defined for capturing the data load time) in staging table. I am getting the below error when i run the talend job;
Null value will be used for context parameter last_run: Unparseable date: " max(load_at)"
[statistics] connecting to socket on port 3920
[statistics] connected
Exception in component tDBInput_1 (job_Staging_000_sylius_Address)
java.sql.SQLSyntaxErrorException: Unknown column 'context.last_run' in 'where clause'
Steps followed by me are as below,
1. create a context variable in job.
2. Use this context variable in DB input query as attached
3. run the job.
what is issue in this design? i am not creating any DB table to store this value.
Hi Sowmiya, Can you please share your job design and screenshot of your context variable defination. I have a same scenario. My source is like MySQL DB and Target is Redshift DB.
context variables are passed this way
"select * from sylius_address where updated_at> "+context.lastrun+"";