Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
MoeE
Partner - Specialist
Partner - Specialist

Will Replicate skip changes in the task's sorter file if I resume by timestamp?

Hi.

 

I have this theoretical scenario which I'm unable to test at the moment. It was also because I saw some strange behaviour earlier with a Qlik Replicate task. As you guys may be aware, this Qlik Community article states that a Replicate task can be resumed by the point it last left off using the "last source timestamp", which can be acquired in the logs after the task stops. This can then be used to continue processing from where it last left off and avoid any missing changes.

Here are my questions:

1. However, I saw a Qlik Replicate task that had 2.5 hours of latency. When the task was stopped, the task logs showed that the "last source timestamp" was only 1 minute back in time. This led my curiosity. If I resumed from that "last source timestamp" in the log file, would I have lost the last 2.5 hours of changes? I don't recall whether it was source latency or target latency, but that shouldn't have an effect right?

2. Also, if there were a lot of changes that were stored on disk in the task's sorter files, and I resumed the task by the "last source timestamp" (say it was 1 minute back in time), would Replicate skip all the changes in the sorter files or would it apply them?

Thank you.

Kind regards,

Mohammed

Labels (1)
1 Solution

Accepted Solutions
shashi_holla
Support
Support

@MoeE 

-> Task having latency with files sitting in disk/sorter and then if you start from a timestamp, it will wipe out all sorter files and you will lose the pending transactions.

-> Simple resume is the right approach in this scenario as everything will be intact

-> If Advance run is required then you need to check the last record updated/inserted in target and then choose that timestamp or can add 5 mins buffer

View solution in original post

4 Replies
OritA
Support
Support

Hi Mohammed, 

Regarding your questions. 

1. When you resume a task, if you simply resume it then Replicate will take the last task saved state and resume it from the last confirmed update to the target and in this case it will assue that no information will be missed. 

There is another option to redume a task: using the advanced run option a task can be resumed from a specific timestamp or from a specific SCN/LSN (depending on the source endpoint) in this case Replicate will start the replication from the chosen point and accordingly updates will be replicated to the target. 

2. In case the target endpoint is not being able to handle the updates in the pace they are performed (for any reason) then the sorter will keep the updates on the disk (each file represents a transaction) and once the target is availble to receive the updates the sorter will send them in the order they were read and once the update to the target are confirmed the file will be removed from the disk.  If the task is reloaded then all the sorter files that were kept on the disk will be removed since they are not needed to be kept any more. 

Hope this clarifies the behaviour. Please let us know if you have any additional questions. 

Regards,
Orit

MoeE
Partner - Specialist
Partner - Specialist
Author

Hi Orit,

 

I appreciate the response. This makes sense. However, I'm skeptical about using the "last source timestamp" to resume a task now due to the strange behaviour I saw earlier, where a task with 2.5 hours of latency showed a "last source timestamp" that was only 1 minute back in time. I don't have much trust in using the "last source timestamp" to resume a task from the correct point without losing any data.

The alternative method I'm using now is to record the time at which the task was stopped, and the latency at that time. Then I use these two values to calculate the Task Resume Timestamp.

Task Resume Timestamp = (Time the task was stopped) - (latency in minutes) - (10 minutes)

The extra 10 minute buffer is an extra precaution.

I'm aware that this is not the most ideal method to resume a task from where it last left off as it can likely result in duplicates, however, it looks like a good alternative as it gives us peace of mind against missing data.

 

I will try to test this next week to get a better understanding of what could've happened. I will double check whether the task was a logstream task, and whether it was source latency on the logstream task that was propagated to the child task, which would have had no effect. This is the only potential scenario I could think of where the "last source timestamp" could behave as explained. I'll let you know the results. Otherwise, I'm skeptical and confused, and will continue using my alternative method. Thanks

 

Regards,

Mohammed

shashi_holla
Support
Support

@MoeE 

-> Task having latency with files sitting in disk/sorter and then if you start from a timestamp, it will wipe out all sorter files and you will lose the pending transactions.

-> Simple resume is the right approach in this scenario as everything will be intact

-> If Advance run is required then you need to check the last record updated/inserted in target and then choose that timestamp or can add 5 mins buffer

MoeE
Partner - Specialist
Partner - Specialist
Author

Hi @shashi_holla@OritA ,

 

Thanks for the responses.

 

Regards,

Mohammed