You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently today, backups with the builtinbackupengine operate in two different modes, either a full backup, or an incremental backup.
Binlogs are retained and replayed during an incremental backup obviously, but during a full backup, they are excluded and there are assumptions made during a restore, that there are no binlogs at all.
Proposal
My proposal is to introduce new flags to opt-in/out of this behavior for the builtinbackupengine. One on the backup side to indicate "yes, include binlogs", then one on the restore side to say "yes, if there are bin-logs, please restore with them." This is done out of caution if we have backups that contain bin-logs, and for some reason want to restore without them, we can choose to not commit.
Backing up binlogs comes with two challenges, but to the best of my knowledge and through a POC, these are purely cosmetic and don't affect the binlogs themselves. The binlog filenames, and binlog index.
Each bin-log file is named such that it's derived from -log-bin[=base_name] configuration, resulting in files such as .../bin-logs/vt_{tabletid}.0000{1,2,3} and a binlog index file such as: .../bin-logs/vt_{tabletid}.index.
This strict naming imposes an issue with backup and restore since it's very likely a full restore will likely restore into a new tablet id, in which case the files backed up as-is will be the wrong filename, and the index will be wrong.
My POC adds a new "Base" type for each FileEntry within a backup, e.g.
// the three bases for files to restorebackupInnodbDataHomeDir="InnoDBData"backupInnodbLogGroupHomeDir="InnoDBLog"backupBinlogDir="BinLog"backupData="Data"
Each of these base types indicate for a given FileEntry within the MANIFEST how to handle it, and behavior can be switched on these.
I plan to introduce a new type that is equivalent to a "stripped binlog file". The idea is we store a FileEntry such as:
This new FileEntry type stores only the suffix of the binlog, which is the relevant unique piece of information necessary. We strip the prefix during backup, and during restore, we use this new Base type to re-apply a new prefix based on the Mycnf.BinLogPath, thus restoring them back into the correct location.
The index is also explicitly excluded from backups, and we re-build the index after the restore has finished. Fortunately, the index format is just a newline separated list of all the bin-log file paths. So this is easy for us to re-generate based on the bin-logs that we restore, we know what needs to be in the index, so we explicitly write this out as a generated file.
Restoring just needs to become aware that we've restored with bin-logs, to change how it resumes replication. Right now, there's an assumption that upon restoring from the backup, we explicitly RESET MASTER and set the GTID position based on the MANIFEST data. When restoring with bin-logs retained, we need to avoid doing this, otherwise, this is instructing MySQL to delete all the bin-logs and start over. Restoring with the bin-logs wouldn't need this extra step to reset things.
I am very open to criticisms on this approach and I have a currently working POC that I'll toss up into a WIP pull request to start working through implementation details.
Use Case(s)
Restoring from a backup that contains bin-logs is beneficial in cases where data streaming is taking place, through means of GTID-based replication.
If we have an expectation of "X days" of bin-log retention, but we roll all nodes in a cluster over a short period of time, each one restoring from a backup, we lose out history of bin-logs since all new nodes come up with very minimal with no history attached.
This behavior can break systems that expect to start replicating from some historical time, since no nodes have access to old enough bin-logs.
The text was updated successfully, but these errors were encountered:
Feature Description
Currently today, backups with the builtinbackupengine operate in two different modes, either a full backup, or an incremental backup.
Binlogs are retained and replayed during an incremental backup obviously, but during a full backup, they are excluded and there are assumptions made during a restore, that there are no binlogs at all.
Proposal
My proposal is to introduce new flags to opt-in/out of this behavior for the
builtinbackupengine
. One on the backup side to indicate "yes, include binlogs", then one on the restore side to say "yes, if there are bin-logs, please restore with them." This is done out of caution if we have backups that contain bin-logs, and for some reason want to restore without them, we can choose to not commit.Backing up binlogs comes with two challenges, but to the best of my knowledge and through a POC, these are purely cosmetic and don't affect the binlogs themselves. The binlog filenames, and binlog index.
Each bin-log file is named such that it's derived from -log-bin[=base_name] configuration, resulting in files such as
.../bin-logs/vt_{tabletid}.0000{1,2,3}
and a binlog index file such as:.../bin-logs/vt_{tabletid}.index
.This strict naming imposes an issue with backup and restore since it's very likely a full restore will likely restore into a new tablet id, in which case the files backed up as-is will be the wrong filename, and the index will be wrong.
My POC adds a new "Base" type for each
FileEntry
within a backup, e.g.Each of these base types indicate for a given FileEntry within the MANIFEST how to handle it, and behavior can be switched on these.
I plan to introduce a new type that is equivalent to a "stripped binlog file". The idea is we store a FileEntry such as:
This new FileEntry type stores only the suffix of the binlog, which is the relevant unique piece of information necessary. We strip the prefix during backup, and during restore, we use this new Base type to re-apply a new prefix based on the
Mycnf.BinLogPath
, thus restoring them back into the correct location.The index is also explicitly excluded from backups, and we re-build the index after the restore has finished. Fortunately, the index format is just a newline separated list of all the bin-log file paths. So this is easy for us to re-generate based on the bin-logs that we restore, we know what needs to be in the index, so we explicitly write this out as a generated file.
Restoring just needs to become aware that we've restored with bin-logs, to change how it resumes replication. Right now, there's an assumption that upon restoring from the backup, we explicitly
RESET MASTER
and set the GTID position based on theMANIFEST
data. When restoring with bin-logs retained, we need to avoid doing this, otherwise, this is instructing MySQL to delete all the bin-logs and start over. Restoring with the bin-logs wouldn't need this extra step to reset things.I am very open to criticisms on this approach and I have a currently working POC that I'll toss up into a WIP pull request to start working through implementation details.
Use Case(s)
Restoring from a backup that contains bin-logs is beneficial in cases where data streaming is taking place, through means of GTID-based replication.
If we have an expectation of "X days" of bin-log retention, but we roll all nodes in a cluster over a short period of time, each one restoring from a backup, we lose out history of bin-logs since all new nodes come up with very minimal with no history attached.
This behavior can break systems that expect to start replicating from some historical time, since no nodes have access to old enough bin-logs.
The text was updated successfully, but these errors were encountered: