In the DDS, Sharding can significantly improve the performance and the processing capacity of the database. Sharding is the process of horizontally splitting a database into multiple Shards and data is evenly distributed across these Shards so that each Shard processes only part of the data, thus sharing load and improving the performance. The following shows the steps to improve the performance with Sharding and advantages of Sharding:
Step
Prepare Shard nodes: Before Sharding, prepare multiple Shard nodes. Shard nodes are instances to store Shards in the MongoDB cluster. Each Shard node can be an independent replica set of MongoDB to ensure high data availability.
Configure Config servers: The Config servers store the Sharding information and configuration information of the entire cluster. At least three Config servers are required to provide redundancy and availability.
Enable Sharding: On the routing node (mongos process) of the DDS, run the sh.enableSharding() command to enable Sharding.
Select a Shard key: Select a Shard key before enabling Sharding. The Shard key is a field used to divide data. Data is sharded and routed to the corresponding Shard based on the value in the field. It is very important to select a proper Shard key, which will directly affect the balanced data distribution and query performance. Monotonically increasing Shard key: When a monotonically increasing Shard key (such as ObjectId or timestamp) is selected, data can be distributed across Shards when inserted, reducing data migration.
Create a Shard collection: Run the sh.shardCollection() command to create the collection to be sharded and specify a Shard key.
Insert data: Insert data into the Shard collection. The document database distributes data across different Shards based on the value of the Shard key.
Advantages
Scalability: With Sharding, data can be distributed across multiple Shard nodes for scale out. As the amount of data increases, more Shard nodes can be added to improve the system performance, rather than relying on scale up of a single node.
Load balancing: With Sharding, data is distributed evenly across multiple Shards to avoid overload of a single node and achieve load balancing.
Query performance: For distributed queries, the document database automatically distributes the queries to the corresponding Shards for parallel queries to improve the query performance.
Data locality: When a proper Shard key is selected, related data can be stored on the same Shard to improve the query efficiency.
High availability: Each Shard can be an independent replica set to provide redundancy and high data availability.
Transparency: Sharding is transparent to applications, and the applications do not need to care about the implementation of sharding, and can operate as if it were using a single database.
Shard Key Policy
Hash a Shard key: Use the hash function to calculate the hash value of the Shard key, and then shard according to the hash value. In this way, data can be evenly distributed across different Shards to avoid data hotspots.
Compound Shard key: In some cases, the Shard key of a single field may not meet the requirements, and multiple fields can be combined to form a compound Shard key to better meet the query requirements.
Automatic Sharding: The document database provides the function of automatic Sharding to automatically shards and routes data to the appropriate Shard according to the specified Shard key. When the sh.shardCollection() command is used, the specified Shard key can implement automatic Sharding.
Pre-Sharding: For large data sets, you can pre-create Shards and manually distribute the data across different Shards. This enables you to better control the distribution and load of the data.
Dynamically adjusting Shards: In the case of uneven data distribution or load unbalance, the number or range of Shards can be dynamically adjusted to balance data and load again.
Note:
Sharding should be carefully planned and implemented. Proper Shard key selection, number of Shard nodes, and redundancy of Config servers all affect the performance and stability. Therefore, careful evaluation and planning is required before Sharding to ensure that data is evenly distributed in the sharded cluster and that the performance can be improved as expected.ved.