I've taken over maintenance on a QS environment v3.2 with sync. persistence,
I'm having an occasional (however often) issue where the QMC on my central node shows only "1 of 1 services running" when in fact all are running on a rim node. When it happens the repository log gives a warning:
Command=Check service status;Result=500;ResultText=Error: Unable to connect to the remote server
/qps/servicestatusworker Check service status 500 Method: 'SendRimQrsStatusRequest'. Failed to retrieve service status from 'http://host.domain.com:4444/status/'. Server host 'host.domain.com'. Error message: 'Unable to connect to the remote server'
The 1 of 1 service still shown as running in the QMC is always the QRS on the rim.
Servers are spread globally, central node is in EMEA (on-premise), rims are in APAC (Azure) and US (on-premise). Problem occurs randomly and occasionally on all servers not in the central EMEA datacenter. The error disappears by itself - usually after a day or so - but I'm unable to isolate when, how and why.
I've found one article stating this might be due servers not having their system time in sync, but this is not the case here unfortunately.