Using MirrorMaker to Synchronize Data Across Clusters

2024-05-09 08:54:43

Application Scenarios

In the following scenarios, MirrorMaker can be used to synchronize data between different Kafka clusters to ensure the availability and reliability of the clusters:

Backup and disaster recovery: An enterprise has multiple data centers. To prevent service unavailability caused by a fault in one data center, cluster data is synchronously backed up in multiple data centers.

Cluster migration: As enterprises migrate services to the cloud, data in on-premises clusters must be synchronized with that in cloud clusters to ensure service continuity.

Solution Architecture

MirrorMaker can be used to mirror data from the source cluster to the target cluster. As shown in Figure 1, in essence, MirrorMaker first consumes data from the source cluster and then produces the consumed data to the target cluster. For more information about MirrorMaker, see Mirroring Data Between Clusters.

Figure 1 Schematic diagram of MirrorMaker

Restriction and Limitations

The IP addresses and port numbers of the nodes in the source cluster cannot be the same as those of the nodes in the target cluster. Otherwise, data will be replicated infinitely in a topic.

Use MirrorMaker to synchronize data between at least two clusters. If there is only one cluster, data will be replicated infinitely in a topic.

Procedure

(1) Buy an elastic cloud server (ECS) that can communicate with the source and target clusters.

(2) Log in to the ECS, install Java JDK, and configure the environment variables JAVA_HOME and PATH. In this command, /usr/local/java/jdk1.8.0_161 is the JDK installation path. Change it to the path where you install JDK.

export JAVA_HOME=/usr/local/java/jdk1.8.0_161

export PATH=$JAVA_HOME/bin:$PATH

(3) Download the software package of Kafka

https://kafka.apache.org/downloads.html

(4) Go to the software package directory and specify the IP addresses and ports of the source and target clusters and other parameters in the config/connect-mirror-maker.properties configuration file.

(5) In the software package directory, start MirrorMaker to synchronize data.

./bin/connect-mirror-maker.sh config/connect-mirror-maker.properties

Verifying Data Synchronization

View the topic list in the target cluster to check whether there are source cluster topics.

Produce and consume messages in the source cluster, view the consumption progress in the target cluster, and check whether data has been synchronized from the source cluster to the target cluster.

Distributed Message Service (Kafka)

Using MirrorMaker to Synchronize Data Across Clusters