Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Incremental Load

Hi,

I want to implement an Incremental load in Talend itself. I have gone through some blogs and posts which suggest creating a table in db to store the job name and last run time. I want a solution that handles the incremental loading by capturing the last update date in Talend itself and while executing the job next time it should pick the value and run for the newly updated records. Or any other approach which doesn't include creating a new table for this purpose.

 

Thanks in advance,

Sowmiya R

Labels (2)
21 Replies
Anonymous
Not applicable
Author

Hi @joearun4u ,

 

Yes, as you said I'm having over a million records and When I executed as suggested by you and @nthampi I found that new records are getting inserted but I guess if we are running it daily then the approach will work but if we want to run once a month or something like that for eg : last run date is 04-APR-2019 again I will run on 04-MAY-2019 in this case If I take MAX(LAST_UPDATE_DATE) then it takes only the records from the date 04-MAY-2019 leaving behind the records for one month. Did I miss anything Kindly guide further

 

Thanks,

Sowmiya R 

Anonymous
Not applicable
Author

Hi,

 

   Once you execute the data between last run time and current time (during first run), the next step is to capture the current time (I mean the end time of current run) back to the DB. 

 

   This will be the start date for the next month. In your example, 04-APR-2019 was the first time you ran the job. So after job execution, you need to capture this date to the DB. So when the next time in May you are running the job, it will fetch all the records between 04-APR-2019 and 01-May-2019.

 

The key part is to store the date of current run back to DB so that we you are executing next time, it will take the full delta range instead of current date.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Hi @nthampi ,

 

I have done this approach already and working fine but the customer doesn't want to create a new table in DB and want the Talend itself to capture the last run time and pick up the time while executing the next time. SO I'm looking for any other alternative approach that Talend alone can handle

 

Thanks,

Sowmiya R

joearun4u
Creator
Creator

Hi Sowmiya,

 

Definitely It will work..

 

You have to capture the Max(lastUpdateDate) along with time for eg, 04-Mar-2019 12:37:08. Then you have to pass the value to input query as 'select * from <SourceTableName> where lastUpdatedate '04-Mar-2019 12:37:08'. So it will fetch all the records which is greater than the given date '04-Mar-2019 12:37:08'.

 

Thanks,

Joe

<Don't forget to give kudos>

Anonymous
Not applicable
Author

Hi Sowmiya,

 

     Could you please share what is the compelling reason shared by customer for not using the DB approach? It is quite an established approach used by many enterprise customers throughout the globe. 

 

     So I am really interested to know why the customer is afraid to have a configuration table to handle this data. If customer is looking to store the information inside Talend, we need to educate the customer that Talend is an ETL tool and not a data storage medium. 

 

     Our Talend community members can definitely shed some light in this area if customer would like to discuss on this part in Talend Community through this post itself. 

 

      Apart from that, I believe the original question for this post is already covered by the posts. If you are happy with the details provided for using DB in this scenario, could you please mark the topic as resolved by marking all the posts which helped you to arrive at the solution?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Vijay_K_N
Contributor
Contributor

hi sowmya,

 

can you share the incremental job to me 

sushantk19
Creator
Creator

Hi,

Can you help me create incremental job. I am new to Talend.

I am attaching the context variable defined by me at staging level. we have 2 fields updated_at(coming from source system) and Load_at (defined for capturing the data load time) in staging table. I am getting the below error when i run the talend job;

 

Null value will be used for context parameter last_run: Unparseable date: " max(load_at)"
[statistics] connecting to socket on port 3920
[statistics] connected
Exception in component tDBInput_1 (job_Staging_000_sylius_Address)
java.sql.SQLSyntaxErrorException: Unknown column 'context.last_run' in 'where clause'

 

Steps followed by me are as below,

1. create a context variable in job.

2. Use this context variable in DB input query as attached

3. run the job.

what is issue in this design? i am not creating any DB table to store this value.


DB_input.png
incremental_context.png
sushantk19
Creator
Creator

Hi Sowmiya, Can you please share your job design and screenshot of your context variable defination. I have a same scenario. My source is like MySQL DB and Target is Redshift DB.

sushantk19
Creator
Creator

@manodwhb: can you please share your job design and screenshot of your context variable defination. I have a similar scenario. My source is MySQL Db and target is Redshift DB.
Vijay_K_N
Contributor
Contributor

context variables are passed this way

"select * from sylius_address where updated_at> "+context.lastrun+"";