Health Check
Scenario
This guide aids users in performing regular health checks on the cluster as part of routine maintenance. This process ensures that all cluster parameters, configurations, and monitoring systems operate without exception, thereby facilitating long-term stability.
Procedure
A cluster health check involves examining every component at the Manager service level. The service level focuses on the ability of components to provide regular services, the status of alarms, and various inspection metrics for each component.
Procedure
Cluster Health Check
1. Initiate a health check for all services manually.
On the cluster details page, click More Operations > Start Cluster Health Check in the upper right corner of the cluster service list.
2. Set up a periodic health check for all services.
Go to Cluster Service > Health Check Report > click the Settings button in the upper right corner of the list. A pop-up for periodic task settings will appear. This allows users to set intervals based on days/weeks/months, and it can be paired with a specific time. After successful creation, a health check will be initiated regularly for the supported component services.
² Description
• In the YI-MapReduce Manager interface, select Cluster Settings > More Operations > Health Check to initiate a cluster health check. The health check launched from the cluster list page targets all component services in the list.
• After clicking on Health Check, you will be redirected to the Health Check Report tab, which displays the cluster health check list. The most recent initiated health check is listed at the top. Expand the first-level list to view the check details for each component.
• Click Export Report under the first-level list operations to export the report. This supports exporting the report as a CSV file, allowing users to review the health report locally.
View and Export Check Reports
Scenario
For a detailed analysis of health check outcomes, you can view and export health check results on YI-MapReduce.
Procedure
The scope of platform health checks includes the health checks at the Manager service level.
Cluster health checks may cover three aspects: the service status of each object checked, alarm information, and related metrics differentiated for each component.
Before You Begin
A health check has been conducted.
Procedure
1. On the cluster details page, click on Management Operations > View Cluster Health Check Report to view the health check report.
2. On the health check report panel, click Export Report to export the health check report. After downloading, you can view the complete information of the check items locally. Both cluster and host health check reports are exported in CSV format.
View YI-MapReduce Service Operation Logs
Operation Log Access
1. Under the Cluster Name column on the cluster list, click on the name of the cluster whose log you want to view, to access the cluster information page.
2. On the cluster information page, click on Operation Logs as shown below to go to the operation logs page.
Operation Type
Currently, the YI-MapReduce service operation logs include a filter box to assist users in quickly locating the source of the issue. After selecting the filter conditions, you can click the Query button to search the log or the Reset button to clear the filter conditions.
1. Operation Status: The operation status filter box is as shown below. Click to select filter conditions, including "Execution successful", "Failed", and "In Progress".
2. Time Range: The time range filter box is as shown below. Click Start Date and End Date to set filter conditions.
Log Field
Description of log field parameters.
Parameter | Parameter Description |
Action Name | The name of the executed operation, such as creating a new cluster, upgrading configurations, etc. |
Status | The status of the operation, including succeeded, failed, in progress. |
Operation Range | The range of the operation, including cluster, node, node group, etc. |
Operation User | The user who executes the operation. |
Start Time/End Time | The start and end times of the operation. |
Remarks | Notes explaining the failure of operations. |