Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Backup bin-log data during full backup with builtinbackupengine #17667

Open
mattrobenolt opened this issue Jan 30, 2025 · 0 comments

Comments

@mattrobenolt
Copy link
Contributor

mattrobenolt commented Jan 30, 2025

Feature Description

Currently today, backups with the builtinbackupengine operate in two different modes, either a full backup, or an incremental backup.

Binlogs are retained and replayed during an incremental backup obviously, but during a full backup, they are excluded and there are assumptions made during a restore, that there are no binlogs at all.

Proposal

My proposal is to introduce new flags to opt-in/out of this behavior for the builtinbackupengine. One on the backup side to indicate "yes, include binlogs", then one on the restore side to say "yes, if there are bin-logs, please restore with them." This is done out of caution if we have backups that contain bin-logs, and for some reason want to restore without them, we can choose to not commit.

Backing up binlogs comes with two challenges, but to the best of my knowledge and through a POC, these are purely cosmetic and don't affect the binlogs themselves. The binlog filenames, and binlog index.

Each bin-log file is named such that it's derived from -log-bin[=base_name] configuration, resulting in files such as .../bin-logs/vt_{tabletid}.0000{1,2,3} and a binlog index file such as: .../bin-logs/vt_{tabletid}.index.

This strict naming imposes an issue with backup and restore since it's very likely a full restore will likely restore into a new tablet id, in which case the files backed up as-is will be the wrong filename, and the index will be wrong.

My POC adds a new "Base" type for each FileEntry within a backup, e.g.

	// the three bases for files to restore
	backupInnodbDataHomeDir     = "InnoDBData"
	backupInnodbLogGroupHomeDir = "InnoDBLog"
	backupBinlogDir             = "BinLog"
	backupData                  = "Data"

Each of these base types indicate for a given FileEntry within the MANIFEST how to handle it, and behavior can be switched on these.

I plan to introduce a new type that is equivalent to a "stripped binlog file". The idea is we store a FileEntry such as:

FileEntry{
  Base: "BinLogStripped",
  Name: "00001",
}

This new FileEntry type stores only the suffix of the binlog, which is the relevant unique piece of information necessary. We strip the prefix during backup, and during restore, we use this new Base type to re-apply a new prefix based on the Mycnf.BinLogPath, thus restoring them back into the correct location.

The index is also explicitly excluded from backups, and we re-build the index after the restore has finished. Fortunately, the index format is just a newline separated list of all the bin-log file paths. So this is easy for us to re-generate based on the bin-logs that we restore, we know what needs to be in the index, so we explicitly write this out as a generated file.

Restoring just needs to become aware that we've restored with bin-logs, to change how it resumes replication. Right now, there's an assumption that upon restoring from the backup, we explicitly RESET MASTER and set the GTID position based on the MANIFEST data. When restoring with bin-logs retained, we need to avoid doing this, otherwise, this is instructing MySQL to delete all the bin-logs and start over. Restoring with the bin-logs wouldn't need this extra step to reset things.

I am very open to criticisms on this approach and I have a currently working POC that I'll toss up into a WIP pull request to start working through implementation details.

Use Case(s)

Restoring from a backup that contains bin-logs is beneficial in cases where data streaming is taking place, through means of GTID-based replication.

If we have an expectation of "X days" of bin-log retention, but we roll all nodes in a cluster over a short period of time, each one restoring from a backup, we lose out history of bin-logs since all new nodes come up with very minimal with no history attached.

This behavior can break systems that expect to start replicating from some historical time, since no nodes have access to old enough bin-logs.

@mattrobenolt mattrobenolt added the Needs Triage This issue needs to be correctly labelled and triaged label Jan 30, 2025
@mattlord mattlord added Type: Feature Component: Backup and Restore and removed Needs Triage This issue needs to be correctly labelled and triaged labels Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants