Data Transmission Service

Using the Data Audit Feature to Compare Synchronization Items

2023-10-30 03:34:46

What Is Data Audit

The data audit feature is used to compare data between the source and destination databases and help you identify inconsistent data. You can use this feature to compare the data from different dimensions, such as the number of objects, the number of rows, and field content.

Data Audit Types

Object Comparison

You can compare the number of objects such as databases, indexes, tables, and views between the source and destination databases to check whether certain data objects are missing.

The following data links are supported: MySQL -> MySQL and PostgreSQL -> PostgreSQL.

Row Count Comparison

You can compare the row count between the source and destination databases to check whether certain table data is missing.

The following data links are supported: MySQL -> MySQL and PostgreSQL -> PostgreSQL.

Content Comparison

You can compare the row data in the tables between the source and destination databases to check whether the row data is consistent.

The following data links are supported: MySQL -> MySQL and PostgreSQL -> PostgreSQL.

You can set the data chunk size: You can flexibly control the size of the data chunk for content comparison. The valid range for the size is 1000 to 10000.

You can sample data for verification: If the amount of data to be migrated or synchronized is excessively large, verifying all data results in heavy workloads on the source and destination databases. Therefore, you can sample data for content comparison. To do so, you can set a sampling ratio to reduce database workloads: 10%, 20%, 30%, ..., 100%.

You can set a task timeout period: You can set the maximum allowed amount of time to run the content comparison task. If the task fails to be completed within the specified period, the task is automatically stopped. The valid value range is 1 to 180 minutes.

You can view the range of inconsistent data: If inconsistent data is found, you can view the corresponding range, which narrows down the data scope that you need to check.

Procedure

  1. On the Instance List page in the Data Sync module, select the corresponding instance, and click the instance ID to go to the Instance Management page.

  2. On the Instance Management page, click Instance Details to go to the details page.

  3. On the details page, click Data Audit in the left-side navigation bar to go to the Data Audit page.

  4. Select Content Comparison. Then, click Create Comparison Task.

  5. On the task creation page, configure the comparison task parameters: the tables to be compared, chunk size, sampling ratio, and timeout period.

  6. Once a task is running or completed, click the comparison task to go to the details page and view the comparison information. This information includes the task status, comparison result, the number of tables compared, the number of inconsistent tables, and the number of inconsistent data chunks.

Precautions

  1. Fields of the following data types are excluded from the comparison:
    When a MySQL database serves as the source database: TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT.
    When a PostgreSQL database serves as the source database: custom DOMAIN or TYPE.

  2. During content comparison, tables with the following characteristics are automatically skipped:
    The table is an empty table that contains no data.
    The table has no primary key, unique field, or unique index.
    The primary key, unique field, or unique index of the table is a composite one that consists of multiple fields.
    The fields that consist of the primary key, unique field, and unique index are not of the following data types:
    The primary key, unique field, or unique index of the table is excluded from the following types:
    When a MySQL database serves as the source database: TINYINT, SMALLINT, MEDIUMINT, INT (INTEGER), BIGINT, CHAR, VARCHAR, FLOAT, DOUBLE, and DECIMAL.
    When a PostgreSQL database serves as the source database: SMALLINT (INT2), INT (INT4 and INTEGER), BIGINT (INT8), CHAR, BPCHAR, VARCHAR, UUID, DECIMAL (NUMERIC), FLOAT4 (REAL), and FLOAT8 (DOUBLE PRECISION).

  3. You are recommended to perform the comparison after the full data synchronization is completed. If the task is migrating or synchronizing incremental data, you are recommended to perform the comparison during off-peak hours to obtain more accurate results.

  4. If multiple large tables need to be compared, you are recommended to create different tasks to compare them in batches. This helps avoid heavy database workloads caused by comparing multiple large tables at a time.


HL1rDjHoH3SD