Distributed Message Service (Kafka)

Handling Service Overload

2024-05-09 09:04:22

Overview

High CPU usage and full disks indicate overloaded Kafka services.

l  High CPU usage leads to low system performance and high risk of hardware damage.

l  If a disk is full, the Kafka log content stored on it goes offline. Then, the disk's partition replicas cannot be read or written, reducing partition availability and fault tolerance. The leader partition switches to another broker, adding load to the broker.

Causes of high CPU usage

l  There are too many data operation threads: num.io.threads, num.network.threads and num.replica.fetchers.

l  Improper partitions. One broker carries all production and consumption services.

Causes of full disks

l  Current disk space no longer meets the needs of the rapidly increasing service data volume.

l  Unbalanced disk usage. The produced messages are all located in one partition, taking up the partition's disk.

l  The data retention time set for a topic is too long. Old data takes up too much disk space.

Procedure

Handling high CPU usage:

l  Optimize the parameters configuration for num.io.threads, num.network.threads and num.replica.fetchers.

l  Set num.io.threads and num.network.threads to multiples of the disk quantity. Do not exceed the number of CPU cores.

l  Set num.replica.fetchers to smaller than or equal to 5.

l  Set topic partitions properly. Set the number of partitions to multiples of the number of nodes.

l  Attach a random suffix to each message key so that messages can be evenly distributed in partitions.

Handling a full disk:

l  Increase the disk space.

l  Migrate partitions from the full disk to other disks on the node.

l  Set a proper data retention time for topics to decrease the volume of old data.

l  If CPU resources are sufficient, compress the data with compression algorithms.

l  Common compression algorithms include ZIP, GZIP, SNAPPY, and LZ4. You need to consider the data compression ratio and duration when selecting compression algorithms. Generally, an algorithm with a higher compression ratio consumes more time. For systems with high performance requirements, select algorithms with quick compression, such as LZ4. For systems with high compression ratio requirements, select algorithms with high compression ratio, such as GZIP.

Configure the compression.type parameter on producers to specify a compression algorithm.


Properties props = new Properties();

props.put("bootstrap.servers", "localhost:9092");

props.put("acks", "all");

props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

// Enable GZIP.

props.put("compression.type", "gzip");

Producer<String, String> producer = new KafkaProducer<>(props);


a_uSF9dKSidW