Solved: Re: Out of Virtual Memory - Qlik Community

IanM · ‎2023-08-10

Hi, we have an Oracle to Oracle replication with a reasonably high CDC throughput.

We have a single uncommitted transaction of around 4.5Mil changes, waiting for source commit.

We then get an error:

Stream component 'st_0_transaction_sorter' terminated
Stream component failed at subtask 0, component st_0_transaction_sorter
Error executing data handler
The Transaction Storage Swap cannot write Event

and checking the windows event logs we have:

'Windows - Out of Virtual Memory'

I'm surprised this happens. I expected Replicate to switch to using disk / tswp files before this would occur.

It seems it might be a competing windows process scheduled to start at 07:00, as we have seen this a number of times at exactly 07:00.

We have 32Gb Ram on the VM with 8Gb Paging File size and Gigabytes of free disk space.

I have previously adjusted my 'common settings' to:

"common_settings": {

"stream_buffers_number": 8,

"stream_buffer_size": 40,

I can ask for the virtual memory to be increased.

I'm concerned that by just increasing the virtual memory, it may just go wrong again once that limit is reached.

Guidance please?

Ian

john_wang · ‎2023-08-10

Hello @IanM ,

Welcome to Qlik Community forum and thanks for reaching out here.

In general, Qlik Replicate fully utilizes the memory for data processing, certainly it depends on tasks settings and the server memory resources. Let's focus on a single task first; if there are multiple tasks running in parallel on the same server, then they share the CPU/Memory/HDD space etc resources and they may impact each other.

In your scenario, when the huge transaction (around 4.5Mil changes) start, Replicate will start getting the changes (by reading Oracle redo log) and try to cache the changes (chunk by chunk) in memory to wait for submit, or rollback transaction flag. If the current allocated memory is insufficient to hold the transaction then Replicate will ask for additional memory allocation: If there are enough memory to allocate to Replicate, then keep the change data in memory; Repeat above step until the memory allocation reached task memory setting "Total transactions memory size exceeds", then the data will be offload from memory to harddisk, see Transaction Offload Tuning.
However while OS trying to allocate more memory to Replicate this time, there are no available memory for allocate, we got 'Windows - Out of Virtual Memory'.

Looks to me the 32G RAM with Gigabytes of HDD are insufficient for so high CDC throughput and big transaction. This is the Recommended hardware configuration, Extra-Large system or more powerful server is highly recommended.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

Heinvandenheuvel · ‎2023-08-10

The error generates from Windows itself being asked too much, which may or might not be due the Replicate configurations.

Clearly, at least for the short while the active processes needed more than 32GB + 8GB = 40 GB. It could be this was due to allowing the Replicate tasks to grow too big, for example the "Task settings - Change Processing Tuning - Force apply a batch when processing memory exceeds (MB): " has a direct effect on the memory you allow replicate tasks to allocate. Similarly Full Load Tuning - "Maximum number of tables" times "Commit rate during full load:" if you go silly there, the system will behave silly as requested.

That 8GB swapspace is sort of a reserve or overflow area which could simply be increase to also be for example 32 GB. If you dip into that, then you may find that the system will just slow down dramatically instead of programs crashing out when using disks instead of memory. Better perhaps, but not great. Best is to control (limit) the programs themselves or the number/type of programs requested to run at the same time.

Best is to work with your Server Manager to have a look at which process uses what memory leading up to the problem moment. They may have specialist tools with historic usage views available, or they may use Windows primitive like "tasklist | sort /+64 " or "Get-WmiObject Win32_Process". Examples of both usages below.

Hein.

CMD - as administrator 
C:\scripts>tasklist | sort /+64

System Idle Process              0 Services                   0          8 K
smss.exe                       404 Services                   0        380 K
SystemSettings.exe            5312 Console                    1        516 K
:
repctl.exe                   15488 Services                   0    177,420 K
:
chrome.exe                    1736 Console                    1    456,792 K
sqlservr.exe                  6312 Services                   0    501,672 K
chrome.exe                    2160 Console                    1    535,064 K
chrome.exe                    8184 Console                    1    746,148 K
Image Name                     PID Session Name        Session#    Mem Usage
oracle.exe                    6184 Services                   0  1,607,756 K
========================= ======== ================ =========== ============

PowerShell as Adminstrator:

PS C:\scripts> ./vm

  Pid HEINV-LPW10 2023-08-10 15:41:28 Elapsed(s) CPU(s)    Reads Writes VM(MB) WS(MB)
  --- ------------------------------- ---------- ------    ----- ------ ------ ------
 8352 C:\Program Files\Attunity\Repl     4566359   2070     7335     57    782    133
 3168 C:\Program Files\Attunity\Repl     4566351   3481 30509240   1272    357     68
15488 Oracle_to_Oracle 127.0.0.1:355      510645    110    81700  29553    464    173

Notice the Working Set (actually in memory) for the single Replicate Task
on this server pid=15488 reported by both as about 170MB. 
The Powershell script also shows the Virtual Memory reserved by the task
That is 464 MB, a lot more than the Working Set.

Script itself:

PS C:\scripts> type vm.ps1
$tmp = $env:COMPUTERNAME + ' ' + (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Get-WmiObject Win32_Process -filter "name like 'rep%.exe'" | Format-Table -AutoSize @{Label="Pid"; Expression={$_.ProcessId}},
 @{ label=$tmp; expression={ (($_.CommandLine -replace("^.*?reptasksrv","")) + ' '*30 ).substring(1,30) }},
# @{ label=$tmp; expression={ (($_.CommandLine).substring(1,60) )}},
 @{ Label="Elapsed(s)"; Expression={[math]::floor((New-TimeSpan $_.ConvertToDateTime($_.CreationDate)).TotalSeconds)}},
 @{ Label="CPU(s)"; Expression={[math]::floor(($_.UserModeTime + $_.KernelModeTime) / 10000000)}},
 @{ Label="Reads"; Expression={$_.ReadOperationCount}},
 @{ Label="Writes"; Expression={$_.WriteOperationCount}},
 @{ Label="VM(MB)"; Expression={[math]::floor($_.VM/1048576)}},
 @{ Label="WS(MB)"; Expression={[math]::floor($_.WS/1048576)}}

View solution in original post

IanM · ‎2023-08-14

As an update - we have found the issue and am posting here as it may help someone else.

Our windows VM support team was able to wind back their monitoring software to 07:00 on the day of the issue and could see all available memory was taken. 838Mb by Replicate, but 23Gb by a process called tracerpt.exe.

Tracerpt appears to be an automated report generating tool looking at the windows system logs.

So we had a look at the logs (Event Viewer: Windows > System) and there was a repeating 'Service Control Manager' error every 30 seconds.

The error was 'Attunity Replicate Console Service' reporting that it could not start and would try again in 30 seconds.

Bearing in mind that Qlik Replicate hasn't used the 'Attunity' name for quite a few upgrades, this repeating error must have been going on for well over a year, every 30 seconds.

It appears the Replicate upgrade failed to fully remove the previous service.

So we believe the Tracerpt was trying to compile a report and used the 23Gb RAM. This was in competition with Replicate trying to deal with our non-committed 'long running transaction', causing Replicate to fail.

We have increased the RAM as a precaution, Disabled the 'auto re-start' on the 'Attunity Replicate Console Service' and also requested that the windows system logs are deleted or archived as soon as possible.

Re-test scheduled..

View solution in original post

john_wang · ‎2023-08-10

Hello @IanM ,

Welcome to Qlik Community forum and thanks for reaching out here.

In general, Qlik Replicate fully utilizes the memory for data processing, certainly it depends on tasks settings and the server memory resources. Let's focus on a single task first; if there are multiple tasks running in parallel on the same server, then they share the CPU/Memory/HDD space etc resources and they may impact each other.

In your scenario, when the huge transaction (around 4.5Mil changes) start, Replicate will start getting the changes (by reading Oracle redo log) and try to cache the changes (chunk by chunk) in memory to wait for submit, or rollback transaction flag. If the current allocated memory is insufficient to hold the transaction then Replicate will ask for additional memory allocation: If there are enough memory to allocate to Replicate, then keep the change data in memory; Repeat above step until the memory allocation reached task memory setting "Total transactions memory size exceeds", then the data will be offload from memory to harddisk, see Transaction Offload Tuning.
However while OS trying to allocate more memory to Replicate this time, there are no available memory for allocate, we got 'Windows - Out of Virtual Memory'.

Looks to me the 32G RAM with Gigabytes of HDD are insufficient for so high CDC throughput and big transaction. This is the Recommended hardware configuration, Extra-Large system or more powerful server is highly recommended.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Heinvandenheuvel · ‎2023-08-10

The error generates from Windows itself being asked too much, which may or might not be due the Replicate configurations.

Clearly, at least for the short while the active processes needed more than 32GB + 8GB = 40 GB. It could be this was due to allowing the Replicate tasks to grow too big, for example the "Task settings - Change Processing Tuning - Force apply a batch when processing memory exceeds (MB): " has a direct effect on the memory you allow replicate tasks to allocate. Similarly Full Load Tuning - "Maximum number of tables" times "Commit rate during full load:" if you go silly there, the system will behave silly as requested.

That 8GB swapspace is sort of a reserve or overflow area which could simply be increase to also be for example 32 GB. If you dip into that, then you may find that the system will just slow down dramatically instead of programs crashing out when using disks instead of memory. Better perhaps, but not great. Best is to control (limit) the programs themselves or the number/type of programs requested to run at the same time.

Best is to work with your Server Manager to have a look at which process uses what memory leading up to the problem moment. They may have specialist tools with historic usage views available, or they may use Windows primitive like "tasklist | sort /+64 " or "Get-WmiObject Win32_Process". Examples of both usages below.

Hein.

CMD - as administrator 
C:\scripts>tasklist | sort /+64

System Idle Process              0 Services                   0          8 K
smss.exe                       404 Services                   0        380 K
SystemSettings.exe            5312 Console                    1        516 K
:
repctl.exe                   15488 Services                   0    177,420 K
:
chrome.exe                    1736 Console                    1    456,792 K
sqlservr.exe                  6312 Services                   0    501,672 K
chrome.exe                    2160 Console                    1    535,064 K
chrome.exe                    8184 Console                    1    746,148 K
Image Name                     PID Session Name        Session#    Mem Usage
oracle.exe                    6184 Services                   0  1,607,756 K
========================= ======== ================ =========== ============

PowerShell as Adminstrator:

PS C:\scripts> ./vm

  Pid HEINV-LPW10 2023-08-10 15:41:28 Elapsed(s) CPU(s)    Reads Writes VM(MB) WS(MB)
  --- ------------------------------- ---------- ------    ----- ------ ------ ------
 8352 C:\Program Files\Attunity\Repl     4566359   2070     7335     57    782    133
 3168 C:\Program Files\Attunity\Repl     4566351   3481 30509240   1272    357     68
15488 Oracle_to_Oracle 127.0.0.1:355      510645    110    81700  29553    464    173

Notice the Working Set (actually in memory) for the single Replicate Task
on this server pid=15488 reported by both as about 170MB. 
The Powershell script also shows the Virtual Memory reserved by the task
That is 464 MB, a lot more than the Working Set.

Script itself:

PS C:\scripts> type vm.ps1
$tmp = $env:COMPUTERNAME + ' ' + (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Get-WmiObject Win32_Process -filter "name like 'rep%.exe'" | Format-Table -AutoSize @{Label="Pid"; Expression={$_.ProcessId}},
 @{ label=$tmp; expression={ (($_.CommandLine -replace("^.*?reptasksrv","")) + ' '*30 ).substring(1,30) }},
# @{ label=$tmp; expression={ (($_.CommandLine).substring(1,60) )}},
 @{ Label="Elapsed(s)"; Expression={[math]::floor((New-TimeSpan $_.ConvertToDateTime($_.CreationDate)).TotalSeconds)}},
 @{ Label="CPU(s)"; Expression={[math]::floor(($_.UserModeTime + $_.KernelModeTime) / 10000000)}},
 @{ Label="Reads"; Expression={$_.ReadOperationCount}},
 @{ Label="Writes"; Expression={$_.WriteOperationCount}},
 @{ Label="VM(MB)"; Expression={[math]::floor($_.VM/1048576)}},
 @{ Label="WS(MB)"; Expression={[math]::floor($_.WS/1048576)}}

IanM · ‎2023-08-11

Thanks John. I will request increase to 64Gb Ram.

IanM · ‎2023-08-11

Many thanks Hein, I will pass this info on to our support team.

IanM · ‎2023-08-14