Under what circumstances does DDS trigger active/standby switchover
When the primary node of the cluster and replica set is down, it will trigger active/standby switchover. Alternatively, you can actively perform active/standby switchover on the Basic Information page of console instance. DDS provides a cluster connection string method, so that the application side can read and write normally after the active/standby switchover. It is recommended that you use this method to configure the DDS access.
How to troubleshoot when high disk utilization is detected
You can view the machine disk usage through monitoring. If the disk utilization is high, troubleshoot the problem from the following angles:
To confirm the business data volume, you can first connect to the database and execute the show dbs command to observe the current data volume of the database. If it is found that the business data volume is too large, you can expand the disk capacity of the instance.
You can also delete expired business data that is no longer needed.
DDS uses the WiredTiger storage engine by default. The WiredTiger storage engine does not directly release disk space when deleting data, you can use the compact command to release and recycle disk space.
For sharding cluster instances, confirm whether the data distribution is unbalanced due to unreasonable sharding selection, and optimize the database collection sharding configuration.
How long is the delay of active/standby synchronization in replica set
The synchronization delay of database active/standby is affected by many factors, such as:
Network latency between the active and standby nodes: Data synchronization between the active and standby nodes needs to be transmitted over the network.
Replication policy: Asynchronous replication often introduces greater latency.
Database load: If the load on the active node is high, the write processing speed may be limited, which increases the latency of the active and standby nodes.
Transaction size: Large transactions often result in longer active/standby synchronization delays.
You can check the synchronization delay of active and standby in the replica set by connecting to DDS and executing the command rs.printSlaveReplicationInfo().
What is the data synchronization method between active and standby nodes
Asynchronous synchronization is achieved via oplog, where the standby node pulls the oplog from the active node and then plays it back locally to synchronize data between the active and standby nodes.
How to troubleshoot excessive memory in sharded cluster
The following methods can be taken to reduce memory usage:
Optimize queries and indexes to ensure that database queries are using appropriate indexes, so as to avoid full table scanning and reduce memory pressure.
Add hardware configuration to expand the memory specification of the instance.
Limit the amount of data returned by the query to avoid returning large amounts of data at one time.
Adjust cache settings, adjusting the cache size with the 'cacheSizeGB' parameter and increasing the cache size appropriately can help improve performance.
Set reasonable sharding key to distribute data evenly to each shard.
How to troubleshoot query errors after continuously writing data to the DDS sharded cluster
When the cluster continues to write data, query errors may occur due to various reasons. Consider locating errors from the following aspects:
View error log: First, check the error log of the DDS cluster to determine the specific error information and error types.
View hardware resource: Ensure that hardware resources, such as CPU, memory, and disk space, are sufficient to support continuous write and query operations.
Check query statement: Ensure that the query statement uses correct syntax, has no spelling errors, and makes reasonable use of indexes.
Check network connection: Ensure that the network connection between the DDS clusters is normal. Network issues may cause query timeouts or failures.
Check lock condition: Holding the lock for a long time may cause query blocking. Use the db.currentOp() command to view the currently executing operations and locks.
If none of the above methods solve the problem, it is recommended that you feedback specific error information, query statements, and cluster configuration to customer service, and R&D personnel will assist in troubleshooting.