-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Support keepers that will never become master/sync #696
Comments
@lawrencejones The logic looks good to me. I'm not sure about you use case (can you provide an example?). Just few details/question:
Yes.
You mean If so |
Sure! We're building a new backup and disaster recovery system for Postgres at GoCardless. One of the components of that system would be a stolon keeper connected to each of our main clusters that would likely be of a way lower machine class (n2-highmem-32 is a bit extreme for a replica) and have a persistent disk attached that we'll configure GCE to snapshot at regular intervals. We don't want to snapshot any of the replicas in the cluster that may become the master as snapshots can have an impact on performance, along with it being weird that just one of those three nodes is arbitrarily chosen and may be down during operation. We also want to provision the replica slightly differently with (for example) very frequent checkpointing to ensure our disk snapshots require a minimal amount of time to recover at startup.
I didn't! I was actually talking about Right now |
You should instead use |
Ah, we didn't realise that We'll have a PR for this coming in today. Thanks for the help! |
Hi @lawrencejones ! I think this feature could be incorporated into setting keeper's priority #622 specifying some dedicated values to nodes to not become synchronous replica or master. That feature is more generalized because provides to make directed switchover from async replica to sync one or from replica to master issuing What do you think about it? |
We have a use case for creating keeper nodes that should be managed by the stolon cluster but never become eligible for promotion. The additional nodes may be far less resourced than the rest of the cluster, hence why we'd never want them to become primary, or have different configuration parameters applied (like frequent checkpointing intervals).
Proposal
The keeper should support two new flags,
--never-master
and--never-synchronous-replica
. The values from the command line flags seem most appropriately placed in theKeeperSpec
struct, if updated withNeverMaster, NeverSynchronousReplica bool
fields. The keeper is not able to directly manipulate the spec, so the suggestion is to:KeeperInfo
structupdateKeepersStatus
method to extract theNeverMaster
andNeverSynchronousReplica
fields into theKeeperSpec
objects insideClusterData
The
updateKeepersStatus
is run on everyclusterSentinelCheck
, thereby ensuring we populate theKeeperSpec
s before ever callingupdateCluster
and making decisions about cluster orientation.Finally, we'd modify the Sentinel to remove any
--never-synchronous-replica
keepers from those considered for synchronous standbys (by applying filtering here) and makefindBestNewMasters
remove--never-master
keepers.Anticipated questions
Why
KeeperInfo
->KeeperSpec
?Not being the original author of this code, it's not fully clear what the intent of
KeeperInfo
vsKeeperState
vsKeeperSpec
vsDBSpec
is. Our understanding is that:KeeperInfo
is information about the keeper that is managed by the keeperKeeperSpec
(while presently empty) should contain specifications about the keeper, managed by the SentinelKeeperStatus
is up-to-date information about the current keeper state, managed by the SentinelDBSpec
is a specification for a database, managed by the SentinelDBStatus
is an up-to-date reflection of database status, managed by the SentinelIt feels most natural to specify the keeper constraints as keeper flags, but it would be good to confirm it's not weird to be pushing this into the
KeeperInfo
and having the Sentinel draw them into theKeeperSpec
. @sgotti will be best placed to answer this?Can we already do this?
While you can do this using pitr mode and external standbys, Stolon provides a load of additional functionality via its
resync
flow to fully manage a standby, ensuring whatever happens in the cluster the standby will be resync'd to match the selected primary. We'd also prefer to model these nodes as part of the same cluster, sharing the same authentication material. Having Stolon manage this would be much better than leaning on the existing external standby configuration.The text was updated successfully, but these errors were encountered: