Scenario
Partition reassignment is to reassign replicas of a partition to different brokers to solve the problem of unbalanced broker load.
Partition reassignment is required in the following scenarios:
l After the broker quantity is increased for an instance, the new brokers do not have any load, and the replicas of the original topic partitions need to be migrated to the new brokers.
l The leader partition is degraded to be a follower on a heavily loaded broker.
l The number of replicas is increased or decreased.
The DMS (Kafka) console provides automatic and manual reassignment. Automatic reassignment is recommended because it ensures that leaders are evenly distributed.
Operation Impact
l Partition reassignment on topics with a large amount of data consumes a large amount of network and storage bandwidth. As a result, business requests may time out or the latency may increase. Therefore, it is recommended to perform reassignment during off-peak hours. Before partition assignment on a topic, it is recommended to evaluate whether the partitions can be balanced based on the Kafka instance specifications and the current instance load, and it is also recommended that you reserve enough bandwidth for partition balancing, and do not perform partition balancing when the CPU usage is above 90%.
l A throttle refers to the upper limit of the bandwidth for replication of a topic, to ensure that other topics on the instance are not affected. Note that throttles apply to replication triggered by both normal message production and partition reassignment. If the throttle is too small, normal message production may be affected, and partition reassignment may never be complete.
l You cannot delete topics whose reassignment tasks have started. Otherwise, the tasks will never be complete.
l You cannot modify the partition quantity of topics whose reassignment tasks have started.
l Reassignment tasks cannot be manually stopped. Please wait until they are complete.
l After partition reassignment, the metadata of the topic changes. If the producer does not support the retry mechanism, a few requests will fail, causing some messages to fail to be produced.
l Reassignment takes a long time if the topic has a large amount of data. It is recommended to decrease the topic aging time based on the topic consumption so that historical data on the topic can be deleted in time to accelerate the migration.
Preparations for Partition Reassignment
l To reduce the amount of data to be migrated and accelerate the migration, decrease the topic aging time without affecting businesses and wait for messages to age. After the reassignment is complete, you can restore the aging time.
l Ensure that the target broker has sufficient disk capacity. If the remaining disk capacity of the target broker is close to the amount of data to be migrated to the broker, expand the disk capacity before the reassignment.
Auto Reassignment
1. Go to the Kafka instance console.
2. Click Topic Management in the left menu bar to enter the topic list page.
3. Select the topic that needs reassignment and click More Options in the action bar to the right.
4. Set automatic reassignment parameters.
l Select the brokers to assign the topic's partition replicas to.
l Specify throttle. The default value is -1, indicating that there is no throttle. If the instance load is light, it is recommended to configure no throttle. If a throttle is required, it is recommended to set it to a value greater than or equal to the total production bandwidth of the to-be-reassigned topic multiplied by the maximum number of replicas of the to-be-reassigned topic.
5. Click OK to jump to the topic list page.
6. In the upper left corner of the topic list, click Partition Reassignment Task to view the partition reassignment task status.
Manual Reassignment
1. Go to the Kafka instance console.
2. Click Topic Management in the left menu bar to enter the topic list page.
3. Select the topic that needs reassignment and click More Options in the action bar to the right.
4. Select Manual Reassignment to set the manual reassignment parameters.
l In the upper right corner of the Manual Reassignment dialog box, click Delete Replica or Add Replica to reduce or increase the number of replicas for each partition of the topic.
l Under the name of the replica to be reassigned, click the broker name or, and select the target broker to migrate the replica to. Assign replicas of the same partition to different brokers.
l Specify throttle in Throttle. The default value is -1, indicating that there is no throttle. If the instance load is light, it is recommended to configure no throttle. If a throttle is required, it is recommended to set it to a value greater than or equal to the total production bandwidth of the to-be-reassigned topic multiplied by the maximum number of replicas of the to-be-reassigned topic.
5. Click OK to jump to the topic list page.
6. In the upper left corner of the topic list, click Partition Reassignment Task to view the partition reassignment task status.
Calculating a Throttle
Throttles are affected by the execution duration of the reassignment, leader/follower distribution of partition replicas, and message production rate. Here is the detailed information:
l A throttle limits the replication traffic of all partitions in a broker.
l Replicas added after the assignment are regarded as followers, and existing replicas are regarded as leaders. Throttles on leaders and followers are separated.
l Throttles do not distinguish between replication caused by normal message production and that caused by partition reassignment. Therefore, the traffic generated in both cases is throttled.
l Assume that the partition reassignment task needs to be completed within 100 s and each replica has 100 MB data. Calculate the throttle in the following scenarios:
Scenario 1: Topic 1 has two partitions and two replicas, and Topic 2 has one partition and one replica. All leader replicas are on the same broker. One replica needs to be added for Topic 1 and Topic 2 respectively.
Table 1 Replica distribution before reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0, 1 |
Topic1 | 1 | 0 | 0, 2 |
Topic2 | 0 | 0 | 0 |
Table 2 Replica distribution after reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0, 1, 2 |
Topic1 | 1 | 0 | 0, 1, 2 |
Topic2 | 0 | 0 | 0, 2 |
As can be seen from the tables, three replicas need to fetch data from Broker 0. Each replica on Broker 0 has 100 MB of data. Broker 0 has only leader replicas, and Broker 1 and Broker 2 have only follower replicas. Therefore:
l Bandwidth required by Broker 0 to complete partition reassignment within 100 s = (100 MB + 100 MB + 100 MB)/100 s = 3 MB/s.
l Bandwidth required by Broker 1 to complete partition reassignment within 100 s = 100 MB/100 s = 1 MB/s.
l Bandwidth required by Broker 2 to complete partition reassignment within 100 s = (100 + 100)/100 = 2 MB/s.
In conclusion, to complete the partition reassignment task within 100 s, set the throttle to a value greater than or equal to 3 MB/s.
Scenario 2: Topic 1 has two partitions and one replica, and Topic 2 has two partitions and one replica. Leader replicas are on different brokers. One replica needs to be added for Topic 1 and Topic 2 respectively.
Table 3 Replica distribution before reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0 |
Topic1 | 1 | 1 | 1 |
Topic2 | 0 | 1 | 1 |
Topic2 | 1 | 2 | 2 |
Table 4 Replica distribution after reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0, 2 |
Topic1 | 1 | 1 | 1, 2 |
Topic2 | 0 | 1 | 1, 2 |
Topic2 | 1 | 2 | 2, 0 |
As can be seen from the tables, Broker 1 has only leader replicas, and Broker 0 and Broker 2 have both leader and follower replicas. Leader and follower replicas on Broker 0 and Broker 2 are throttled separately. Therefore:
l Bandwidth required by Broker 0 (leader) to complete partition reassignment within 100 s = 100 MB/100 s = 1 MB/s.
l Bandwidth required by Broker 0 (follower) to complete partition reassignment within 100 s = 100 MB/100 s = 1 MB/s.
l Bandwidth required by Broker 1 to complete partition reassignment within 100 s = (100 MB + 100 MB)/100 s = 2 MB/s.
l Bandwidth required by Broker 2 (leader) to complete partition reassignment within 100 s = 100 MB/100 s = 1 MB/s.
l Bandwidth required by Broker 2 (follower) to complete partition reassignment within 100 s = (100 MB + 100 MB + 100 MB)/100 s = 3 MB/s.
In conclusion, to complete the partition reassignment task within 100 s, set the throttle to a value greater than or equal to 3 MB/s.
Scenario 3: Both Topic 1 and Topic 2 have one partition and two replicas. All leader replicas are on the same broker. One replica needs to be added to Topic 1. Messages are produced on Topic 1, causing replication (700 KB/s).
Table 5 Replica distribution before reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0, 1 |
Topic2 | 0 | 0 | 0, 1 |
Table 6 Replica distribution after reassignment
Topic name | Partition Name | Broker of Leader Replica | Broker of Follower Replica |
Topic1 | 0 | 0 | 0, 1, 2 |
Topic2 | 0 | 0 | 0, 1 |
As can be seen from the tables, one replica needs to fetch data from Broker 0 for partition reassignment, and the other replica needs to fetch data from Broker 0 for message production. Since the throttle does not distinguish between message production and partition reassignment, the traffic caused by both is limited and counted. Therefore:
l Bandwidth required by Broker 0 to complete partition reassignment within 100 s = (100 MB + 700 KB/s x 100 s)/100 s + 700 KB/s= 2.4 MB/s.
l Bandwidth required by Broker 2 to complete partition reassignment within 100 s = 100 MB/100 s = 1 MB/s.
In conclusion, to complete the partition reassignment task within 100 s, set the throttle to a value greater than or equal to 2.4 MB/s.