This section introduces high-risk operations and solutions for the Cloud Container Engine.
During the deployment of services on container clusters, users may perform potentially high-risk operations that could trigger business disruptions of varying severity. To help users better anticipate and avoid operational risks, this document highlights some high-risk operations at the cluster node level, their potential consequences, and recommended solutions to prevent accidental issues.
Node Type | High-Risk Operation | Consequences | Solution |
Master Node | Node expiration or deletion | Master node becomes unavailable. If it’s the only master, the entire cluster fails. | Irrecoverable |
Master Node | Manually modifying master or etcd versions | May cause cluster failure. | Revert to original versions. |
Master Node | Deleting or formatting core directories (e.g., /etc/kubernetes, /data/containerd) | Master node becomes unavailable. If it’s the only master, the entire cluster fails. | Irrecoverable |
Master Node | Reinstalling the OS | Master components are deleted. If it’s the only master, the entire cluster fails. | Irrecoverable |
Master Node | Removing critical kernel modules/files | Master node becomes unavailable. If it’s the only master, the entire cluster fails. | Irrecoverable |
Master Node | Modifying OS configurations | May cause master node failure. If it’s the only master, the entire cluster fails. | Manually restore original configurations. |
Master Node | Modifying core component parameters | May cause master node failure. | Restore default parameters. |
Master Node | Modifying /etc/resolv.conf or other key configs | May cause network failures or image pull errors. | Manually restore original configurations. |
Master Node | Manually replacing master/etcd certificates | May cause cluster failure. | Irrecoverable |
Master Node | Changing the node IP | Master node becomes unavailable. | Revert to the original IP. |
Master Node | High resource usage by workloads | May cause core component or node failure. | Clean up resources and set proper quotas. |
Master Node | Changing the hostname | Master node becomes unavailable. | Revert to the original hostname. |
Node (Worker) Node | Node deletion or expiration | Node becomes unavailable. | Irrecoverable |
Node (Worker) Node | Reinstalling the OS | Node becomes unavailable. | Irrecoverable |
Node (Worker) Node | Removing critical kernel modules/files | Node becomes unavailable. | Irrecoverable |
Node (Worker) Node | Modifying OS configurations | May cause node failure. | Attempt to restore original configurations. |
Node (Worker) Node | Modifying core component parameters | May cause node failure. | Restore default parameters. |
Node (Worker) Node | Deleting/modifying critical data directories or disks | Node becomes unavailable. | Irrecoverable |
Node (Worker) Node | Changing directory/container permissions | Permission errors. | Avoid modification. Restore original permissions if needed. |
Node (Worker) Node | Changing the node IP | Node becomes unavailable. | Revert to the original IP. |
Node (Worker) Node | High resource usage by workloads | May cause core component or node failure. | Clean up resources and set proper quotas. |
Node (Worker) Node | Changing the hostname | Node becomes unavailable. | Revert to the original hostname. |
When nodes are activated in a container cluster, interconnected and invisible security group rules will be created. Please do not modify these security groups without caution.
Direction | Action | IP Version | Priority | Protocol | CIDR Block | Port Range | Solution |
Inbound | Allow | IPv4 | 99 | Any | VPC CIDR | All Ports | Do not modify this security group rule |
Inbound | Allow | IPv6 | 99 | Any | VPC CIDR | All Ports | |
Inbound | Allow | IPv6 | 99 | Any | 100::/16 | All Ports | |
Outbound | Allow | IPv4 | 99 | Any | 0.0.0.0/0 (All addresses) | All Ports | |
Outbound | Allow | IPv6 | 99 | Any | ::/0 (All IPv6 addresses) | All Ports |