Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Vote for your favorite Qlik product ideas and add your own suggestions.
The Kafka target endpoint uses the librdkafka library to provide a Producer client to Kafka clusters from Replicate. The librdkafka library is able to emit internal metrics at regular intervals if the Producer is configured accordingly. Today, the Kafka target endpoint does not expose those metrics for their subsequent analysis.
The method to tell the Producer to return the metrics requires setting the statistics.interval.ms configuration property to a value > 0 and registering an internal callback method (normally through the property stats_cb) to handle the storage of the information produced.
The feature request consists of:
Having access to the advanced telemetry is very important in every stage in a use case implementation, but most importantly during PERF testing and performance fine-tuning. The granularity provided by librdkafka statistics is essential for configuration and performance analysis. It would give the opportunity to anticipate potential throughput/latency issues in production environments or find root causes in the case of production issues.
Telemetry description
The advanced Kafka telemetry exposed by librdkafka has the following levels:
As most operations are windowed operations (operating on slices of time), Topics and Partitions levels include Windows stats: moving average, smallest and largest values, sum of values, percentile values, etc.
Each level provides very valuable telemetry of librdkafka, as producer, and therefore Replicate as producer. Below are examples of fields that provide information on the producer performance:
Top-level
Field | Type | Description |
tx | int | Total number of requests sent to Kafka brokers |
tx_bytes | int | Total number of bytes transmitted to Kafka brokers |
rx | int | Total number of responses received from Kafka brokers |
rx_bytes | int | Total number of bytes received from Kafka brokers |
txmsgs | int | Total number of messages transmitted (produced) to Kafka brokers |
txmsg_bytes | int | Total number of message bytes (including framing, such as per-Message framing and MessageSet/batch framing) transmitted to Kafka brokers |
rxmsgs | int | Total number of messages consumed, not including ignored messages (due to offset, etc), from Kafka brokers. |
rxmsg_bytes | int | Total number of message bytes (including framing) received from Kafka brokers |
Brokers
Field | Type | Description |
state | string | Broker state (INIT, DOWN, CONNECT, AUTH, APIVERSION_QUERY, AUTH_HANDSHAKE, UP, UPDATE) |
stateage | int gauge | Time since last broker state change (microseconds) |
outbuf_cnt | int gauge | Number of requests awaiting transmission to broker |
outbuf_msg_cnt | int gauge | Number of messages awaiting transmission to broker |
waitresp_cnt | int gauge | Number of requests in-flight to broker awaiting response |
waitresp_msg_cnt | int gauge | Number of messages in-flight to broker awaiting response |
tx | int | Total number of requests sent |
txbytes | int | Total number of bytes sent |
txerrs | int | Total number of transmission errors |
txretries | int | Total number of request retries |
req_timeouts | int | Total number of requests timed out |
rx | int | Total number of responses received |
rxbytes | int | Total number of bytes received |
rxerrs | int | Total number of receive errors |
rxcorriderrs | int | Total number of unmatched correlation ids in response (typically for timed out requests) |
rxpartial | int | Total number of partial MessageSets received. The broker may return partial responses if the full MessageSet could not fit in the remaining Fetch response size. |
disconnects | int | Number of disconnects (triggered by broker, network, load-balancer, etc.). |
int_latency | object | Internal producer queue latency in microseconds. See Window stats below |
outbuf_latency | object | Internal request queue latency in microseconds. This is the time between a request is enqueued on the transmit (outbuf) queue and the time the request is written to the TCP socket. Additional buffering and latency may be incurred by the TCP stack and network. See Window stats below |
rtt | object | Broker latency / round-trip time in microseconds. See Window stats below |
throttle | object | Broker throttling time in milliseconds. See Window stats below |
Topics
Field | Type | Description |
batchsize | object | Batch sizes in bytes. See Window stats· |
batchcnt | object | Batch message counts. See Window stats· |
partitions | object | Partitions dict, key is partition id. See partitions below. |
Partitions
Field | Type | Description |
msgq_cnt | int gauge | Number of messages waiting to be produced in first-level queue |
msgq_bytes | int gauge | Number of bytes in msgq_cnt |
xmit_msgq_cnt | int gauge | Number of messages ready to be produced in transmit queue |
xmit_msgq_bytes | int gauge | Number of bytes in xmit_msgq |
fetchq_cnt | int gauge | Number of pre-fetched messages in fetch queue |
fetchq_size | int gauge | Bytes in fetchq |
committed_offset | int gauge | Last committed offset |
txmsgs | int | Total number of messages transmitted (produced) |
txbytes | int | Total number of bytes transmitted for txmsgs |
rxbytes | int | Total number of bytes received for rxmsgs |
msgs | int | Total number of messages received (consumer, same as rxmsgs), or total number of messages produced (possibly not yet transmitted) (producer). |
msgs_inflight | int gauge | Current number of messages in-flight to/from broker |
next_ack_seq | int gauge | Next expected acked sequence (idempotent producer) |
next_err_seq | int gauge | Next expected errored sequence (idempotent producer) |
acked_msgid | int | Last acked internal message id (idempotent producer) |
Window stats
Field | Type | Description |
min | int gauge | Smallest value |
max | int gauge | Largest value |
avg | int gauge | Average value |
sum | int gauge | Sum of values |
cnt | int gauge | Number of values sampled |
stddev | int gauge | Standard deviation (based on histogram) |
hdrsize | int gauge | Memory size of Hdr Histogram |
p50 | int gauge | 50th percentile |
p75 | int gauge | 75th percentile |
p90 | int gauge | 90th percentile |
p95 | int gauge | 95th percentile |
p99 | int gauge | 99th percentile |
p99_99 | int gauge | 99.99th percentile |
outofrange | int gauge | Values skipped due to out of histogram range |
Telemetry example
Attached is an example of the information returned in every statistics dump at regular intervals.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.