Document Database Service

Specifications on Designing DDS

2025-07-28 07:17:17

Library Design Specification

  • Database naming convention: db_xxxx.

  • The library name should be all lowercase. Try not to use any special characters other than _. Try to use the library name starting with numbers, such as: 123_abc. Libraries exist in the form of folders, and the use of special characters or other non-standard naming methods can lead to confusion in naming.

  • The database name contains a maximum of 64 characters.

  • Before creating a new library, it is advisable to evaluate its volume, QPS, etc. and discuss with the DBA in advance whether to create a new library or create a new cluster specifically for that library.

Collection Design Specification

  • The collection name must be all lowercase. Do not use any special characters other than _. Do not use collection names starting with numbers, such as: 123_abc. Do not start with system, which is the system collection prefix.

  • The collection name contains a maximum of 64 characters.

  • Writing large collections in a library will affect the read and write performance of other collections. If the busy collections are in a DB, it is recommended that the maximum number of collections be 80, and the performance of disk I/O should also be considered.

  • If the data volume of the single collection is very large, you can split the large table into multiple small tables, and then store each small table in a separate database or sharding tables.

  • The DDS collection has the function of "automatically cleaning up expired data", which can be achieved by adding a TTL index to the time field of documents in the collection, but it should be noted that the type of this field must be mongoDate(), which must be designed with the actual business needs in mind.

  • Design polling collection: Whether the collection is designed as a Capped restricted set, which must be designed with the actual business needs in mind.

Collection Creation Rules

Different configurations can be used for different business scenarios:

db.createCollection("logs",
{ "storageEngine": { "wiredTiger":
              { "configString": "internal_page_max=16KB,leaf_page_max=16KB,leaf_value_max=8KB,os_cache_max=1GB"} }
})
  • If it is a table with more reads and fewer writes, we can try to set the page size as small as possible when creating it, such as 16KB, for example: internal_page_max=16KB, leaf_page_max=16KB, leaf_value_max=8KB, os_cache_max=1GB.

  • If the data volume of this table with more reads and less writes is relatively large, you can set a compression algorithm for it, for example: block_compressor=zlib, internal_page_max=16KB, leaf_page_max=16KB, leaf_value_max=8KB.

Document Design Specification

  • Try not to use any special characters other than underscore (_) for keys in the collection.

  • Try to store documents of the same type in one collection and spread documents of different types in different collections. The documents of the same type can greatly improve index utilization, but if documents are mixed, queries may often require full table scanning.

  • Whenever possible, do not use _id, for example, writing custom content to _id.

  • The tables of DDS are similar to those of InnoDB, both of which are index organized tables, with data content following the primary key, and _id is the default primary key in DDS. Once the value of _id is non-incremental, when the amount of data reaches a certain level, each write may cause the binary tree of the primary key to be greatly adjusted, which will be a costly write. Therefore, write will decrease with the increase of the amount of data, be sure not to write custom content into _id.

  • Try not to make the array field a query condition.

  • If the field is large, it should be compressed and stored as much as possible.

  • Do not store long strings. If this field is a query condition, make sure that the value of this field does not exceed 1KB. MongoDB's index only supports fields within 1K. If the length of the data you store exceeds 1K, it will not be indexed.

  • Try to store data with unified case.

  • If the data volume of the single collection is very large, you can split the large table into multiple small tables, and then store each small table in a separate database or sharding tables.

Index Design Specification

  • The combined index policy of DDS is the same as that of MySQL, following the "leftmost principle".

  • Index names should not exceed 128 characters in length.

  • The query scenario should be evaluated comprehensively, and the number of indexes should be reduced by incorporating single-column indexes into composite indexes as much as possible, combining point 1 and 2.

  • Override indexes are preferred.

  • When creating a composite index, you should evaluate the fields contained in the index, and try to put the fields with large data base (data with many unique values) at the front of the composite index.

  • DDS supports TTL indexes, which can automatically delete data before XXX seconds as needed, and try to perform delete operations during off-peak hours. Check whether this type of index is required.

  • When the data volume is large, the creation of DDS index is a slow process, so you should try to evaluate before going online or before the data volume becomes large, and create the index that will be used as needed.

Shard Design Specification

In the DDS sharded cluster, the design specification of sharding is very important, which can affect the performance, scalability, and reliability of cluster. Here are some common DDS shard design specifications:

  • Selection of sharding key: The sharding key is a field or combination of fields used to shard a collection. Fields with high selectivity (i.e., distinction) should be selected as sharding key, such as fields that are accessed frequently or fields that are highly unique. The use of random values or fields with low selectivity as sharding key should be avoided, as this can lead to uneven data distribution and affect cluster performance and scalability.

  • Data type of sharding key: The data type of sharding key should be chosen to be appropriate for the application and data. For example, if a query frequently uses timestamp, which can be used as sharding key and suitable date time data types can be selected.

  • Range of sharding key: The range of the sharding key should be selected appropriately so that it can be evenly distributed when the data is sharded. For example, if you use range query, you should choose a sharding key with a large range to distribute data evenly across multiple shards.

  • Number of nodes in sharded cluster: The number of nodes in a sharded cluster should be selected based on data volume, load, and scalability requirements. It is usually recommended to set the number of nodes in a sharded cluster to the power of 2, such as 2, 4, 8, and 16, in order to better manage and maintain the cluster.

  • Avoid using _id as the sharding key. _id is usually ordered, which does not meet the requirements of being widely distributed. And the frequent use of _id as a query condition is not conducive to sharding.

  • Backup and recovery of shards: When using DDS sharded clusters, data should be backed up and restored regularly to ensure data security and reliability. You can use the backup and recovery tools provided by DDS or third-party tools to complete these operations.

It should be noted that when designing a DDS sharded cluster, you should select the appropriate shard design specifications based on the specific application and DDS environment. During the O&M process of sharded clusters, you must follow the best practices and specifications of DDS, such as properly managing data and monitoring cluster performance and faults.


Onm7qz2F2ak6