Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW

Qlik Replicate LOGSTREAM Timeout while waiting to get data from audit file

100% helpful (1/1)
cancel
Showing results for 
Search instead for 
Did you mean: 
Pedro_Lopez
Support
Support

Qlik Replicate LOGSTREAM Timeout while waiting to get data from audit file

Last Update:

Feb 1, 2022 7:38:40 AM

Updated By:

Sonja_Bauernfeind

Created date:

Feb 1, 2022 4:31:15 AM

In some cases when Replicate can't read from the Staging Folder of a LogStream task for whatever reason (corrupted folder, lack of disk space etc), there could be difficulties to resume the LogStream task after the initial issue is solved.

You might see errors like the below when trying to resume from timestamp:

[UTILITIES ]E: Failed to write to audit file  <audit_folder directory>
[UTILITIES ]E: Timeout while waiting to get data from audit file [1002521] (at_audit_file.c:637)

[UTILITIES ]E: Error reading audit batch [1002509] (at_audit_file.c:679)

 

Environment

  • Qlik Replicate 2021.11 (not exclusive)
  • Linux RHEL / Windows 

 

Resolution

 

  1. Stop all LogStream and Replication tasks
  2. Kill the replicate sessions and process as the audit file is being locked by the process
  3. Rename the audit folder of the problematic audit file (the folder is the one described under the endpoint settings: "Staging Folder" in the Replicate UI)
  4. Resume LogStream task from timestamp (a few hours before the initial error), then resume replication tasks from the same timestamp 
  5. If not solved, a reload of the task will be needed

 

Cause 

Usually, the Replicate process (repctl) is locking the audit file that was being written/read while the issue occurred.

 

Comments
joseph_jbh
Contributor III
Contributor III

Hey @Pedro_Lopez , thanks for the info!
When I tried to follow the steps, the task was smart enough to look into the renamed folder:

00021812: 2022-09-20T14:13:58 [UTILITIES ]T: open audit file K:\Replicate\logstream\DCS_OUTBOUND\lspDCS_OUTBOUND\LOG_STREAM\audit_service\20210205123530927654_beforeCorruption\7012 for write (at_audit_writer.c:506)
00021812: 2022-09-20T14:13:58 [UTILITIES ]T: Reading audit file 'K:\Replicate\logstream\DCS_OUTBOUND\lspDCS_OUTBOUND\LOG_STREAM\audit_service\20210205123530927654_beforeCorruption\7012' with header version '1' (at_audit_file.c:399)

That's after a resume-by-timestamp. Any thoughts?

Sonja_Bauernfeind
Digital Support
Digital Support

Hello @joseph_jbh 

Have you attempted the reload of the task (last step if the resume does not work), rather than only to resume by time stamp?

All the best,
Sonja 

joseph_jbh
Contributor III
Contributor III

Hi Sonia - Thanks for replying.  I'm sure a Reload would work, even if I have to clean out the log_stream folder....But I'm following Pedro's tip as a way to avoid that.  Some of our log stream parents supply nearly 75 child tasks which would need to be reloaded too....

Perchance, have you guys seen this symptom on non-HA deployments of Replicate?  Ours is an HA deployment, using a Windows failover cluster, and shared storage.   I'm curious if this is contributing to the problem.

Sonja_Bauernfeind
Digital Support
Digital Support

Hello Joseph,

At this point, I would recommend sending that query over to our Qlik Replicate forum directly as it would require additional investigation. 

All the best,
Sonja 

joseph_jbh
Contributor III
Contributor III

Understood, thanks Sonja.

Kohila
Contributor III
Contributor III

Team,

Any solutions (other from reloading) found for the Parent task timeout failure that occurred when the audit file's data was being fetched? We encountered the identical issue today, which required us to reload both Log stream and log replicate. 

Task errored and in stopped state.  No command line  shown up in process tab for the failed task to kill the  session locking audit file. No luck with advanced timestamp.  Please let us know if already figured out the solution or workaround for resuming the task

 

Thanks,

Kohila

joseph_jbh
Contributor III
Contributor III

I think the article is in error...A rename of the folder isn't sufficient: I've since discovered that a log stream parent (LSP) opens every subfolder in its root, and examines the contents.  The folder would need to be moved out of the root in order to hide it, if that's the goal.

At any rate, resuming by timestamp will start a new LSP timeline.  You can then resume the children by SCN=0, or with the same timestamp you started the parent with.

Out of curiosity, what was the root cause of the timeout?  Mine was a corrupted file, seemingly related to multiple, quick failovers during patching (we now bring down the Qlik services before patching).

aarun_arasu
Support
Support

Hi @Kohila ,

Have you tried the below steps as mentioned in this article

1.Stop all LogStream and Replication tasks

2.Kill the replicate sessions and process as the audit file is being locked by the process

3.Rename the audit folder of the problematic audit file (the folder is the one described under the endpoint settings: "Storage path:" in the Replicate UI) --> This normally means you are creating a new stagging folder

4.Resume LogStream task from timestamp (a few hours before the initial error), then resume replication tasks from the same timestamp 

If the above steps did not help then a reload is required.

But , there are few cases where these lock gets released after a server reboot , you can consider this option if its feasible.

aarun_arasu
Support
Support

Hello @joseph_jbh ,

The timeout may have occurred due to the replicating process (repctl) locking the audit file while it was being written or read when the issue happened.

Or

The file might  have been corrupted, causing the replicate to make multiple attempts to read it, resulting in continuous failures due to the corruption.

The above are the possible cause for this issue

 

Regards

Arun

Version history
Last update:
‎2022-02-01 07:38 AM
Updated by: