Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Jan 22, 2024 9:35:30 PM
Sep 21, 2021 4:47:46 PM
This article demonstrates how to monitor the performance of Web Services deployed in Talend Runtime. It explains the statistics generated by Talend Web Services and how to leverage the metrics using JMX and Nagios to improve the Quality of the Service.
The Metrics feature of CXF management module provides aggregate statistics for Web Services running in the CXF Bus such as response times, endpoint status, and throughput. Understanding these metrics facilitates the development of high-performance Web Services, and on Production systems it finds problems at an early stage and troubleshoots them faster.
Linux distribution—this example uses Ubuntu 17.10
Active Internet connection — to download dependencies and installation packages
Talend ESB — installed
DemoService ESB — demo sample in Studio — is deployed successfully
JConsole — for monitoring the metrics using JMX
Nagios installed and configured with ESB templates for monitoring (optional)
Nagios is an open source monitoring solution that allows users to identify infrastructure problems before they effect important business processes. Nagios monitors the entire IT infrastructure to ensure services, applications, and business processes are working as expected. Talend ESB can also be monitored using Nagios.
Install Nagios and Jmx4Perl.
Configure the CXF template files, cxf.cfg and cxf-host.cfg, jmx_commands.cfg, delivered with Talend Runtime in the directory TalendRuntimePath/add-ons/adapters/nagios, see Talend ESB Nagios configuration template files.
Note: The CXF template files, located in the cxf_templates_nagios.zip file attached to this article, are customized for this demonstration. If you use these templates, modify the Critical and Warning status parameters to display your business needs.
Monitor host_esb to display the statistics of the host, for example, host status and up-time.
Monitoring Talend Runtime Server using Nagios
If you are not using a monitoring solution like Nagios, you can monitor the JMX metrics using JConsole.
If your Talend Runtime is installed on a remote server, modify the settings in the container/etc/SERVICE_NAME-wrapper.conf file, to enable JMX monitoring. For example, Talend-Runtime-wrapper.conf or karaf-wrapper.conf:
# Uncomment or add below lines to enable JMX wrapper.java.additional.10=-Dcom.sun.management.jmxremote.port=1616 wrapper.java.additional.11=-Dcom.sun.management.jmxremote.authenticate=false wrapper.java.additional.12=-Dcom.sun.management.jmxremote.ssl=false wrapper.java.additional.13=-Djava.rmi.server.hostname=HOST_NAME or IP_ADDRESS_OF_RUNTIME
Open JConsole, and navigate to MBeans > Metrics.Server.
Metrics MBeans for DemoService
The Metrics feature in Talend ESB is implemented using the Codahale metrics library, and provides several commonly used metric types such as Meter, Counter, Timer, and Histogram, to output metrics values. Before analyzing these metrics, you need to know some basics about the metric types.
Meter: measures event occurrences count and rate. A meter metric measures Mean throughput and one-, five-, and fifteen-minute exponentially-weighted moving average throughputs.
MBeans type Meter and its attributes
nMinuteRate: returns the last n-minute exponentially-weighted moving average rate at which events have occurred since the meter was created. For example, where n = 1, 5, 15.
MeanRate: rate at which events have occurred after the meter is started.
An example from Wikipedia:
|
Comparison of common averages of values { 1, 2, 2, 3, 4, 7, 9 } |
|||
|
Type |
Description |
Example |
Result |
|
Mean |
Sum of values of a data set divided by number of values |
(1+2+2+3+4+7+9) / 7 |
4 |
Counter: records incrementations and decrementations. For example, counting the number of DemoService invocations.
MBeans type Counter and its attributes
Timer: provides the duration statistics. It aggregates Min, Mean, and Max durations since the start of the meter. It encapsulates and gathers throughput statistics from Meter, response statistics from Histogram, and statistics from Counter.
MBeans type Timer and its attributes
Histogram: keeps track of a stream of long values, and analyzes their statistical characteristics such as Max, Min, Mean, Median, standard deviation, 75th percentile, and 99th percentile. Statistics generated by Histogram can be retrieved using a Timer Object as Snapshots.
Mean: is the average response time an application takes to process a request.
Max: is the maximum response time an application takes to process a request from a user.
Min: is the minimum response time an application takes to return a request to a user.
Percentiles: are useful for giving the relative standing that is, how one particular data value compares to the rest of the data. The Timer Object also provides metrics for 50th, 75th, 95th, and 98th percentile scores.
For example: A service after running continuously for a few days. The average response time for a service could be low for example, 1 to 2 secs. However, the Max response time may go up, for example, 60 secs depending on the health of the server, network, increase in load, or the number of concurrent users.
Web Services performance can be measured in terms of throughput, response times, latency, availability, and many other metrics. Higher throughput, lower latency, low response times, low error rate, and high availability are some of the characteristics every application should have.
Throughput: is the total number of transactions processed by the server in a given time. The time is calculated from the start of the first sample to the end of the last sample. Throughput is also measured in terms of the number of bytes exchanged per second.
Response Time: is the time taken for processing at the server before it returns.
Latency: is the additional delay involved for a processed request to reach the remote client. Latency increases if the network quality decreases.
Faults and Errors: must be in an acceptable limit and might increase with network issues and high load.
Availability: measures if the Endpoint is reachable.
| Measure | MBeans Type | Metric Attributes | Statistics to monitor |
|
Throughput (events/sec) |
type=Metrics.Server Attribute=Totals |
MeanRate OneMinuteRate FiveMinuteRate FifteenMinuteRate |
MeanRate: measures throughput from the start of the Timer, and may not represent the actual current load. nMinuteRate: measures the moving average throughput for the last n-minute. The nMinuteRate is useful to understand the real time load on the system for last 1, 5, and 15 minutes respectively. |
|
Response Times (millisecs) |
type=Metrics.Server Attribute=Totals |
Min Max Mean Percentiles |
Observes the deviations between the Min and Max response times. Percentiles can also be monitored if Average measurements don't reflect actual load conditions. |
|
In Flight orders (count) |
type=Metrics.Server Attribute=Totals |
Count |
Returns count of pending requests. If the count is increasing, either the load on the system is increasing, or the server needs performance tuning. |
| Availability |
type = Bus.Service.Endpoint |
State |
Endpoint is reachable if it is in the STARTED state. |
|
Errors (count) |
type=Metrics.Server
Attribute=Checked Application Faults Attribute=Logical Runtime Faults Attribute=Runtime Faults Attribute=Unchecked Application Faults |
Count | Returns count of checked, unchecked, logical, and Runtime Faults. |
|
Number of Invocations (count) |
type=Metrics.Server Attribute=Totals |
Count |
Number of requests processed after the service is started. |
To get a better understanding of the performance of your Web Service, and to understand the Metric attributes, you should perform Load tests. Perform Load tests using many concurrent users for a minimum duration of 1 or 2 minutes, depending on your business requirements. Jmeter or SoapUI are popular for performing Load tests of Web Services.
Server: Talend Runtime is rebooted to ensure the metrics set to a default value 0.
Service: DemoService is deployed in Karaf.
Nagios: Metric refresh interval is 90 secs (open source)
Jmeter settings:
Goal
Observe the statistics generated by the metrics in multiple steps, and understand how the readings change with and without a Load on the server.
Capture the screen shots in both JConsole and Nagios.
Test 1: Statistics generated one minute after triggering the Load test.
Note: The Load test is initiated, and the container is under a load of 10,000 requests.
Throughput
Response Times
NumOfInvocations
Conclusion:
nMinuteRate and the Max response time increased to a new high level.
In production scenarios, the InFlight or Pending orders should be closely monitored.
Test 2: Statistics generated two minutes after triggering Load test.
Note: Load test is complete, and the container is idle.
Throughput
Response Times
NumOfInvocations
Conclusion:
The nMinuteRate readings reflect the real time load on the system, and the Max response time remained the same.
The Max and Min response times help to analyze and read timeouts.
The Count from the InFlight and Error metric attributes can identify increasing load or performance issues.
Question: Can I monitor Talend ESB Rest services?
Answer: Yes. CXF templates can be used for both SOAP and REST Data Services.
Question: Can I monitor Talend ESB Camel Routes?
Answer: Yes. Camel templates must be used for Routes using cSOAP and cREST components. For more information, see the Camel metrics templates.
Question: Why is the Nagios console displaying an alert or warning message?
Answer: The CXF template used in this demo is configured to display a critical alert if the number of transactions/sec crosses 100, and a warning alert for more than 50 transactions/sec. So, the color for oneMinuteRate fluctuates between green, pink, and yellow.
<Check OneMinuteRate> MBean = org.apache.cxf:bus.id=*,type=Metrics.Server,service="$1",port="$0",Attribute=Totals Attribute = OneMinuteRate Name = OneMinuteRate #no of events per sec Critical 100 Warning 50 </Check>
Question: What other monitoring solutions are supported by Talend ESB?
Answer: Talend supports monitoring using JMX protocol, and any vendor API that can query JMX complying to standard security policies.