Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM precheck failed to check GTID when stop and then start from checkpoint with same task config #11648

Open
D3Hunter opened this issue Oct 11, 2024 · 2 comments · May be fixed by #11668
Open
Assignees
Labels
area/dm Issues or PRs related to DM. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@D3Hunter
Copy link
Contributor

D3Hunter commented Oct 11, 2024

What did you do?

below is the steps I summarized from a user feedback, I hasn't reproduce it locally, but from the code seems it will fail

  • start a dm incr task with explicit GTID
  • run for a while, then purge some binlogs, and run another while
  • stop task, and checkpoint is kept
  • start again with same config, this time we will start from checkpoint actually
  • precheck "meta_position" failed with "ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires."

in precheck, we are always uses the position in task config, but since we are recover from checkpoint, we should uses the position in checkpoint instead as that's checkpoint position is the real position we will use, and might also need to consider this issue pingcap/dm#1418

if c.enableGTID {
gtidSet, err2 := gtid.ParserGTID(flavor, c.meta.BinLogGTID)
if err2 != nil {
markCheckError(result, err2)
result.Instruction = "you should check your BinlogGTID's format, "
if flavor == mysql.MariaDBFlavor {
result.Instruction += "it should consist of three numbers separated with dashes '-', see https://mariadb.com/kb/en/gtid/"
} else {
result.Instruction += "it should be any combination of single GTIDs and ranges of GTID, see https://dev.mysql.com/doc/refman/8.0/en/replication-gtids-concepts.html"
}
return result
}
streamer, err = syncer.StartSyncGTID(gtidSet)

What did you expect to see?

No response

What did you see instead?

No response

Versions of the cluster

dm version 7.5.3, upstream unknown right now,

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
@D3Hunter D3Hunter added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Oct 11, 2024
@fishiu
Copy link
Contributor

fishiu commented Oct 12, 2024

/assign

@fishiu
Copy link
Contributor

fishiu commented Oct 15, 2024

My attempts to reproduce, but without expected errors:

  1. Start a mysql docker instance with --log-bin=mysql-bin --binlog-format=ROW --server-id=1 --gtid_mode=ON --enforce-gtid-consistency=true;
  2. Insert data into mysql, and start a mode=full dm task to initialize, then stop the full task;
    • show master status: xxx:1-11
    • show binary logs: mysql-bin.000001 to mysql-bin.000003
  3. Start a mode=incremental dm task (config gtid xxx:1-11);
  4. Insert data into mysql (saved to mysql-bin.000003)
    • show master status: xxx:1-13
    • show binary logs: mysql-bin.000001 to mysql-bin.000003
  5. flush logs and purge binary logs to 'mysql-bin.000004'
    • show master status: xxx:1-13
    • show binary logs: mysql-bin.000004
  6. Insert data into mysql (saved to mysql-bin.000004)
    • show master status: xxx:1-14
    • show binary logs: mysql-bin.000004
  7. Stop incremental dm task
  8. Start incremental dm task (same config gtid xxx:1-11), while checkpoint is xxx:1-14. Prechecker check based on 1-11 and found 12-14 is missing. However 12-13 is purged and we get error:
    • ERROR 1236 (HY000): Cannot replicate because the source purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new replica from backup. Consider increasing the source's binary log expiration period. The GTID set sent by the replica is 'faa40251-8a9e-11ef-a97a-0242ac110002:1-11', and the missing transactions are 'faa40251-8a9e-11ef-a97a-0242ac110002:12-13'
    • This is different to the error mentioned above: "ERROR 1236 (HY000): The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires."
> dmctl start-task precheck/incr-task.yaml
{
    "result": false,
    "msg": "",
    "sources": [
    ],
    "checkResult": "[code=26005:class=dm-master:scope=internal:level=medium], Message: fail to check synchronization configuration with type: check was failed, please see detail
        detail: {
                "results": [
                        {
                                "id": 2,
                                "name": "meta position check",
                                "desc": "check whether meta position is valid for db",
                                "state": "fail",
                                "errors": [
                                        {
                                                "severity": "fail",
                                                "short_error": "ERROR 1236 (HY000): Cannot replicate because the source purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new replica from backup. Consider increasing the source's binary log expiration period. The GTID set sent by the replica is 'faa40251-8a9e-11ef-a97a-0242ac110002:1-11', and the missing transactions are 'faa40251-8a9e-11ef-a97a-0242ac110002:12-13'"
                                        }
                                ],
                                "instruction": "you should make sure your meta's binlog position is valid and not purged, and the user has REPLICATION SLAVE privilege",
                                "extra": "address of db instance - 127.0.0.1:3308"
                        }
                ],
                "summary": {
                        "passed": false,
                        "total": 11,
                        "successful": 10,
                        "failed": 1,
                        "warning": 0
                }
        }"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants