Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, we have an Oracle to Oracle replication with a reasonably high CDC throughput.
We have a single uncommitted transaction of around 4.5Mil changes, waiting for source commit.
We then get an error:
Stream component 'st_0_transaction_sorter' terminated
Stream component failed at subtask 0, component st_0_transaction_sorter
Error executing data handler
The Transaction Storage Swap cannot write Event
and checking the windows event logs we have:
'Windows - Out of Virtual Memory'
I'm surprised this happens. I expected Replicate to switch to using disk / tswp files before this would occur.
It seems it might be a competing windows process scheduled to start at 07:00, as we have seen this a number of times at exactly 07:00.
We have 32Gb Ram on the VM with 8Gb Paging File size and Gigabytes of free disk space.
I have previously adjusted my 'common settings' to:
I can ask for the virtual memory to be increased.
I'm concerned that by just increasing the virtual memory, it may just go wrong again once that limit is reached.
Guidance please?
Ian
Hello @IanM ,
Welcome to Qlik Community forum and thanks for reaching out here.
In general, Qlik Replicate fully utilizes the memory for data processing, certainly it depends on tasks settings and the server memory resources. Let's focus on a single task first; if there are multiple tasks running in parallel on the same server, then they share the CPU/Memory/HDD space etc resources and they may impact each other.
In your scenario, when the huge transaction (around 4.5Mil changes) start, Replicate will start getting the changes (by reading Oracle redo log) and try to cache the changes (chunk by chunk) in memory to wait for submit, or rollback transaction flag. If the current allocated memory is insufficient to hold the transaction then Replicate will ask for additional memory allocation: If there are enough memory to allocate to Replicate, then keep the change data in memory; Repeat above step until the memory allocation reached task memory setting "Total transactions memory size exceeds", then the data will be offload from memory to harddisk, see Transaction Offload Tuning.
However while OS trying to allocate more memory to Replicate this time, there are no available memory for allocate, we got 'Windows - Out of Virtual Memory'.
Looks to me the 32G RAM with Gigabytes of HDD are insufficient for so high CDC throughput and big transaction. This is the Recommended hardware configuration, Extra-Large system or more powerful server is highly recommended.
Hope this helps.
Regards,
John.
The error generates from Windows itself being asked too much, which may or might not be due the Replicate configurations.
Clearly, at least for the short while the active processes needed more than 32GB + 8GB = 40 GB. It could be this was due to allowing the Replicate tasks to grow too big, for example the "Task settings - Change Processing Tuning - Force apply a batch when processing memory exceeds (MB): " has a direct effect on the memory you allow replicate tasks to allocate. Similarly Full Load Tuning - "Maximum number of tables" times "Commit rate during full load:" if you go silly there, the system will behave silly as requested.
That 8GB swapspace is sort of a reserve or overflow area which could simply be increase to also be for example 32 GB. If you dip into that, then you may find that the system will just slow down dramatically instead of programs crashing out when using disks instead of memory. Better perhaps, but not great. Best is to control (limit) the programs themselves or the number/type of programs requested to run at the same time.
Best is to work with your Server Manager to have a look at which process uses what memory leading up to the problem moment. They may have specialist tools with historic usage views available, or they may use Windows primitive like "tasklist | sort /+64 " or "Get-WmiObject Win32_Process". Examples of both usages below.
Hein.
CMD - as administrator
C:\scripts>tasklist | sort /+64
System Idle Process 0 Services 0 8 K
smss.exe 404 Services 0 380 K
SystemSettings.exe 5312 Console 1 516 K
:
repctl.exe 15488 Services 0 177,420 K
:
chrome.exe 1736 Console 1 456,792 K
sqlservr.exe 6312 Services 0 501,672 K
chrome.exe 2160 Console 1 535,064 K
chrome.exe 8184 Console 1 746,148 K
Image Name PID Session Name Session# Mem Usage
oracle.exe 6184 Services 0 1,607,756 K
========================= ======== ================ =========== ============
PowerShell as Adminstrator:
PS C:\scripts> ./vm
Pid HEINV-LPW10 2023-08-10 15:41:28 Elapsed(s) CPU(s) Reads Writes VM(MB) WS(MB)
--- ------------------------------- ---------- ------ ----- ------ ------ ------
8352 C:\Program Files\Attunity\Repl 4566359 2070 7335 57 782 133
3168 C:\Program Files\Attunity\Repl 4566351 3481 30509240 1272 357 68
15488 Oracle_to_Oracle 127.0.0.1:355 510645 110 81700 29553 464 173
Notice the Working Set (actually in memory) for the single Replicate Task
on this server pid=15488 reported by both as about 170MB.
The Powershell script also shows the Virtual Memory reserved by the task
That is 464 MB, a lot more than the Working Set.
Script itself:
PS C:\scripts> type vm.ps1
$tmp = $env:COMPUTERNAME + ' ' + (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Get-WmiObject Win32_Process -filter "name like 'rep%.exe'" | Format-Table -AutoSize @{Label="Pid"; Expression={$_.ProcessId}},
@{ label=$tmp; expression={ (($_.CommandLine -replace("^.*?reptasksrv","")) + ' '*30 ).substring(1,30) }},
# @{ label=$tmp; expression={ (($_.CommandLine).substring(1,60) )}},
@{ Label="Elapsed(s)"; Expression={[math]::floor((New-TimeSpan $_.ConvertToDateTime($_.CreationDate)).TotalSeconds)}},
@{ Label="CPU(s)"; Expression={[math]::floor(($_.UserModeTime + $_.KernelModeTime) / 10000000)}},
@{ Label="Reads"; Expression={$_.ReadOperationCount}},
@{ Label="Writes"; Expression={$_.WriteOperationCount}},
@{ Label="VM(MB)"; Expression={[math]::floor($_.VM/1048576)}},
@{ Label="WS(MB)"; Expression={[math]::floor($_.WS/1048576)}}
As an update - we have found the issue and am posting here as it may help someone else.
Our windows VM support team was able to wind back their monitoring software to 07:00 on the day of the issue and could see all available memory was taken. 838Mb by Replicate, but 23Gb by a process called tracerpt.exe.
Tracerpt appears to be an automated report generating tool looking at the windows system logs.
So we had a look at the logs (Event Viewer: Windows > System) and there was a repeating 'Service Control Manager' error every 30 seconds.
The error was 'Attunity Replicate Console Service' reporting that it could not start and would try again in 30 seconds.
Bearing in mind that Qlik Replicate hasn't used the 'Attunity' name for quite a few upgrades, this repeating error must have been going on for well over a year, every 30 seconds.
It appears the Replicate upgrade failed to fully remove the previous service.
So we believe the Tracerpt was trying to compile a report and used the 23Gb RAM. This was in competition with Replicate trying to deal with our non-committed 'long running transaction', causing Replicate to fail.
We have increased the RAM as a precaution, Disabled the 'auto re-start' on the 'Attunity Replicate Console Service' and also requested that the windows system logs are deleted or archived as soon as possible.
Re-test scheduled..
Hello @IanM ,
Welcome to Qlik Community forum and thanks for reaching out here.
In general, Qlik Replicate fully utilizes the memory for data processing, certainly it depends on tasks settings and the server memory resources. Let's focus on a single task first; if there are multiple tasks running in parallel on the same server, then they share the CPU/Memory/HDD space etc resources and they may impact each other.
In your scenario, when the huge transaction (around 4.5Mil changes) start, Replicate will start getting the changes (by reading Oracle redo log) and try to cache the changes (chunk by chunk) in memory to wait for submit, or rollback transaction flag. If the current allocated memory is insufficient to hold the transaction then Replicate will ask for additional memory allocation: If there are enough memory to allocate to Replicate, then keep the change data in memory; Repeat above step until the memory allocation reached task memory setting "Total transactions memory size exceeds", then the data will be offload from memory to harddisk, see Transaction Offload Tuning.
However while OS trying to allocate more memory to Replicate this time, there are no available memory for allocate, we got 'Windows - Out of Virtual Memory'.
Looks to me the 32G RAM with Gigabytes of HDD are insufficient for so high CDC throughput and big transaction. This is the Recommended hardware configuration, Extra-Large system or more powerful server is highly recommended.
Hope this helps.
Regards,
John.
The error generates from Windows itself being asked too much, which may or might not be due the Replicate configurations.
Clearly, at least for the short while the active processes needed more than 32GB + 8GB = 40 GB. It could be this was due to allowing the Replicate tasks to grow too big, for example the "Task settings - Change Processing Tuning - Force apply a batch when processing memory exceeds (MB): " has a direct effect on the memory you allow replicate tasks to allocate. Similarly Full Load Tuning - "Maximum number of tables" times "Commit rate during full load:" if you go silly there, the system will behave silly as requested.
That 8GB swapspace is sort of a reserve or overflow area which could simply be increase to also be for example 32 GB. If you dip into that, then you may find that the system will just slow down dramatically instead of programs crashing out when using disks instead of memory. Better perhaps, but not great. Best is to control (limit) the programs themselves or the number/type of programs requested to run at the same time.
Best is to work with your Server Manager to have a look at which process uses what memory leading up to the problem moment. They may have specialist tools with historic usage views available, or they may use Windows primitive like "tasklist | sort /+64 " or "Get-WmiObject Win32_Process". Examples of both usages below.
Hein.
CMD - as administrator
C:\scripts>tasklist | sort /+64
System Idle Process 0 Services 0 8 K
smss.exe 404 Services 0 380 K
SystemSettings.exe 5312 Console 1 516 K
:
repctl.exe 15488 Services 0 177,420 K
:
chrome.exe 1736 Console 1 456,792 K
sqlservr.exe 6312 Services 0 501,672 K
chrome.exe 2160 Console 1 535,064 K
chrome.exe 8184 Console 1 746,148 K
Image Name PID Session Name Session# Mem Usage
oracle.exe 6184 Services 0 1,607,756 K
========================= ======== ================ =========== ============
PowerShell as Adminstrator:
PS C:\scripts> ./vm
Pid HEINV-LPW10 2023-08-10 15:41:28 Elapsed(s) CPU(s) Reads Writes VM(MB) WS(MB)
--- ------------------------------- ---------- ------ ----- ------ ------ ------
8352 C:\Program Files\Attunity\Repl 4566359 2070 7335 57 782 133
3168 C:\Program Files\Attunity\Repl 4566351 3481 30509240 1272 357 68
15488 Oracle_to_Oracle 127.0.0.1:355 510645 110 81700 29553 464 173
Notice the Working Set (actually in memory) for the single Replicate Task
on this server pid=15488 reported by both as about 170MB.
The Powershell script also shows the Virtual Memory reserved by the task
That is 464 MB, a lot more than the Working Set.
Script itself:
PS C:\scripts> type vm.ps1
$tmp = $env:COMPUTERNAME + ' ' + (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Get-WmiObject Win32_Process -filter "name like 'rep%.exe'" | Format-Table -AutoSize @{Label="Pid"; Expression={$_.ProcessId}},
@{ label=$tmp; expression={ (($_.CommandLine -replace("^.*?reptasksrv","")) + ' '*30 ).substring(1,30) }},
# @{ label=$tmp; expression={ (($_.CommandLine).substring(1,60) )}},
@{ Label="Elapsed(s)"; Expression={[math]::floor((New-TimeSpan $_.ConvertToDateTime($_.CreationDate)).TotalSeconds)}},
@{ Label="CPU(s)"; Expression={[math]::floor(($_.UserModeTime + $_.KernelModeTime) / 10000000)}},
@{ Label="Reads"; Expression={$_.ReadOperationCount}},
@{ Label="Writes"; Expression={$_.WriteOperationCount}},
@{ Label="VM(MB)"; Expression={[math]::floor($_.VM/1048576)}},
@{ Label="WS(MB)"; Expression={[math]::floor($_.WS/1048576)}}
Thanks John. I will request increase to 64Gb Ram.
Many thanks Hein, I will pass this info on to our support team.
As an update - we have found the issue and am posting here as it may help someone else.
Our windows VM support team was able to wind back their monitoring software to 07:00 on the day of the issue and could see all available memory was taken. 838Mb by Replicate, but 23Gb by a process called tracerpt.exe.
Tracerpt appears to be an automated report generating tool looking at the windows system logs.
So we had a look at the logs (Event Viewer: Windows > System) and there was a repeating 'Service Control Manager' error every 30 seconds.
The error was 'Attunity Replicate Console Service' reporting that it could not start and would try again in 30 seconds.
Bearing in mind that Qlik Replicate hasn't used the 'Attunity' name for quite a few upgrades, this repeating error must have been going on for well over a year, every 30 seconds.
It appears the Replicate upgrade failed to fully remove the previous service.
So we believe the Tracerpt was trying to compile a report and used the 23Gb RAM. This was in competition with Replicate trying to deal with our non-committed 'long running transaction', causing Replicate to fail.
We have increased the RAM as a precaution, Disabled the 'auto re-start' on the 'Attunity Replicate Console Service' and also requested that the windows system logs are deleted or archived as soon as possible.
Re-test scheduled..