Distributed Message Service (Kafka)

Features

2024-05-09 03:35:46

Message Receiving and Sending

l  Message sending, which supports specified partition sending, synchronous sending, asynchronous sending, and partitioned batch sending. Support authentication. When a client connects to a broker, SSL or SASL is used for authentication, and data transmission is encrypted. Support the ACL mechanism to provide clients with read-write permission authentication. Message body compression can be implemented through compression algorithms, reducing the amount of data transmitted over the network and improving the message-sending throughput of Kafka.

l  Message consumption, which supports the consumption based on the specified partitions or offset. The poll method is used to support batch consumption and message broadcasting.

Message Order

According to the business requirements, message order is separated into global order and local order.

l  Global order: All messages in a topic are consumed in the order in which they are produced.

l  Local order: Messages in a topic with the same business field must be consumed in the order in which they are produced. For example, the topic messages are order flow tables and contain the orderId parameter of each order. The business requires messages with the same orderId value to be consumed in their production order.

Since a Kafka topic can be divided into multiple partitions, producers are scattered across different partitions when sending messages. Although producers send messages to brokers in order, the messages may go to partitions randomly after entering Kafka. Therefore, in order to implement the global order, one topic can only be related to one partition.

To implement the local order, the partition key is required during message sending. Kafka can then calculate the hash, and decide which partition to put the message in according to the calculation result. In this way, messages with the same partition key are placed in the same partition. In this case, you can configure multiple partitions to improve the overall throughput of topics.

Consumer Group

In Kafka, some topics have millions or even tens of millions of messages. If they are consumed only by one consumer process, the consumption speed can be extremely slow. Kafka provides a consumer group feature that allows one partition to be consumed by only one consumer in a consumer group. But one consumer in a consumer group can consume multiple partitions. The three advantages of consumer groups are higher consumption efficiency, flexible consumption modes, and convenient fault recovery.

Message Backtracking

When a Kafka message is written to a broker, it is stored for a period of time. During this period, if a consumer causes the message to be lost (in events such as submitting the offset too early, encountering a sending fault, or performing a rebalancing operation), you can send a seek command to the consumer to tell the consumer to reconsume from a specific offset.

Message Cleanup

Each Kafka partition has a corresponding log file. Each log is divided into multiple log segments, whose sizes are configurable. When the set size is exceeded, the current file is closed and a new file is opened. These segments cannot be stored permanently. They must be cleaned up when certain conditions are met. Kafka has three cleanup modes: time-based, log-size-based, and log-offset-based.

1) Time-based mode: Check whether the retention time of each log file exceeds the specified threshold. This involves the log.retention.hours, log.retention.minutes and log.retention ms parameters.Logs exceeding the retention time are deleted. For example, if log.retention.hours is set to 24, segment data in past 1 day is retained. All other segments are deleted. Note that active segments, which mean segments that are being written, cannot be deleted, as with the following modes.

2) Log-size-based mode: Check whether the size of each segment file exceeds the specified threshold. log.segment.bytes is used to set the size of a single segment. Segments exceeding the size threshold are deleted.

3) Log-start-offset-based mode: Check whether the offset (end position) of a segment file is less than the set offset threshold. Segments meeting the condition are deleted.

High Throughput

Kafka is designed to have high throughput. It takes full advantage of the disk sequential read/write feature when writing data to the disk. At the same time, Kafka uses zero-copy technology for data writing and synchronization. The sendFile() function for invocation. The sendFile() function is called to directly transfer data between two file descriptors and operates completely in the kernel, thus avoiding the copying of data between the kernel buffer cache and the user buffer cache. This can achieve extremely high operation efficiency. Kafka also supports data compression and batch sending. Each topic is divided into multiple partitions in Kafka. All of these bring Kafka high throughput and make it well testified.

High Availability

Each topic in Kafka has multiple partitions, which each has multiple replicas. Therefore, replicas in Kafka are divided based on partitions. Replicas in the same partitions have the same message sequence. These replicas are scattered on different brokers to prevent data unavailability caused by the downtime of some brokers. In a production environment, replicas of each partition are distributed across different machines, so that even if some machines fail, the cluster remains available.


HbwVDHMw7lb2