If the cluster version is used, the user can specify the shard to create the sharded table during table creation. Subsequent reads and writes to the sharded table are routed to the appropriate shard server based on the specified sharding key.
It should be noted that the cluster version can also create non-sharded table. If the user does not create a sharded table after connecting to mongos, a non-sharded table will be created by default and stored on a shard server, which will limit capacity and performance.
Select sharding policy
The common sharding methods for DDS clusters include Range and Hash.
Range sharding
The range sharding method divides the data into multiple chunks according to the range of the sharding key, and each chunk (default configuration is 64MB) stores a part of data with continuous range. Range sharding can meet the needs of range queries, but its drawbacks are also evident. If the write model of the service has a clear tendency to increment or decrement according to the sharding key, the write operations will be largely distributed on the same shard, resulting in the failure to expand the write capacity.
Hash sharding
The hash sharding method first calculates the hash value of the sharding key, and then distributes the document into different chunks based on the range of hash values. Hash sharding can ensure that the distribution of data on each shard is basically balanced. However, the disadvantage is that if a range query occurs, the mongos node needs to broadcast the request to the shard server, which will lead to a decline in query efficiency.
Select sharding key
The selection of sharding key needs to ensure that the data distribution is sufficiently discrete, the storage capacity of each shard is balanced, and the data operations can be evenly distributed to all shards in the cluster.
If the sharding key is not properly selected, it may lead to uneven load of each shard, resulting in the inability to split due to jumbo chunk. Moreover, once the sharding key is determined, it cannot be changed during the operations, you need to create a new table using new sharding key and then re-import the data.
When selecting a shard key, the following factors should be considered:
Distinction of sharding key
The value base of sharding key determines the maximum number of chunks to be included. If the value base is too small, the number of chunks will be very low, which may lead to uneven load. For example, setting the sharding key based on "gender" is not a reasonable choice, because "gender" only has two values: "male" and "female", so there can be up to 2 chunks.
Whether the value distribution of the sharding key is even
If there is a hot spot in the value of sharding key, it may also lead to uneven sharding load. For example, if "country" is used as the sharding key, the load will be uneven due to differences in population between countries. Countries with a large population will have large storage capacity and many requests, while countries with a small population will have small storage capacity and few requests. In this scenario, you can consider using a composite key as the sharding key to reduce the probability of hot spots.
Whether to write monotonically according to the sharding key
If the service reads and writes according to the increment or decrement trend of the sharding key, requests may be concentrated on a certain chunk at a certain moment, and the advantages of multiple shards in the cluster cannot be fully utilized. For example, in the scenario where logs are stored, data is written to the same chunk at a certain time if the range sharding is performed based on the log creation time. In this scenario, you can consider composite sharding key or hash sharding to avoid this.
Query whether the model contains a sharding key
After determining the sharding key, you need to consider whether the sharding key is included in the query request of the service. The mongos node forwards the request to the corresponding shard server based on the sharding key in the query request. If the query request does not contain a sharding key, then the mongos node broadcasts the request to all the shard servers on the backend for scatter/gather queries, which degrades the query performance.
Create sharded table and related considerations
Use the mongo shell client to connect to the mongos node, and the steps and commands to create a sharded table are as follows.
1. Set the corresponding database to shard mode.
sh.enableSharding(<database>)
2. Specify a sharding key to create the sharded table.
sh.shardCollection(<namespace>, <key>, <unique>, <options>)
The parameters in the command are as follows:
namespace: It is in the form of ".", such as "mydb.myshardcollection".
key: Indicates the sharding key and sharding policy, 1 means range sharding, and "hashed" means hash sharding. For example, {"myshardKey": "hashed"}.
unique: Indicates whether the sharding key has a global uniqueness constraint, true means unique, and can only be false for hash sharding.
options: Indicates the parameter options for sharding, which can specify the number of pre-allocated chunks for hash sharding, such as {numInitialChunks: 5}. The default size of each chunk is 64MB, and chunks can be pre-allocated through numInitialChunks for hash sharding, which can effectively reduce the performance jitter caused by chunk split migration during the write process.