Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
After the Remote Engine microservice is invoked over 100+ times intensely, the microservice becomes unresponsive, with threads getting stuck and not progressing. There are a total of 12 microservices deployed.
The customer has to un-deploy the affected microservice and redeploy it to restore functionality. However, after another period of intensive service usage, the same problem reoccurs.
Log trace and thread dump
http-nio-5081-exec-1
Stack Trace is:
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(java.base@11.0.24/Native Method)
at java.net.SocketOutputStream.socketWrite(java.base@11.0.24/SocketOutputStream.java:110)
at java.net.SocketOutputStream.write(java.base@11.0.24/SocketOutputStream.java:150)
at org.apache.logging.log4j.core.net.TcpSocketManager.writeAndFlush(TcpSocketManager.java:253)
at org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:219)
- locked <0x00000006c72c6ed0> (a org.apache.logging.log4j.core.net.TcpSocketManager
As Microservice is an always-on daemon process that will occupy a dedicated log collector worker thread, with the Microservice deployment increasing and considering there will be other concurrent task executions that consuming more log collector workers, the default ms.worker.thread.number=10
will not meet the performance requirement and it should be adapted with a bigger size to avoid the race condition of the log collection process.
Please review the thread dump and make some adjustments to the configuration:
ms.worker.thread.number
in /etc/org.talend.ipaas.rt.dsrunner.log4jsocket.collector.cfg
file from 10 to 20.ms.worker.thread.number
should consider the total number of microservice deployed + concurrent tasks in high-load situations. For more information about Data Service Runner configuration files, please refer to Talend Help Documentation
Configuring-data-service-runner