Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to disable sync replication when there're not enough ready standbys. #841

Merged
merged 1 commit into from
Sep 16, 2021
Merged

Add ability to disable sync replication when there're not enough ready standbys. #841

merged 1 commit into from
Sep 16, 2021

Conversation

rg2011
Copy link

@rg2011 rg2011 commented Aug 26, 2021

Fixes issue #330 by adding support for MinSynchronousStandbys == 0, which removes fake sync replicas from the replica list.

Changes to cluster config:

  • Removed check for MinSynchronousStandbys > 0

Changes to sentinel:

  • Skip adding the fakeStandbyName sync standby when the available standbys are 0.

Changes to ha_test:

  • Duplicated tests that depend on sync replication stopping writes when the sync replica is down.

@rg2011
Copy link
Author

rg2011 commented Aug 28, 2021

Last test run errors look like etcd timeouts, unrelated to the code itself:

proxy_test.go:164: error waiting on store up: timeout

Could someone with permissions please rerun/confirm?

@sgotti
Copy link
Member

sgotti commented Sep 1, 2021

@rg2011 Thanks for your PR. #330 was a very old discussion and many things in the code and sentinel logic changed from that time. So:

  • I'm not sure I can totally understand the changes to the sentinel logic but they affect multiple parts. Won't be possible to just allow setting MinSynchronousStandbys to 0 without doing any change to the sentinel logic and avoiding adding a new option (strictSyncRepl)?

@rg2011
Copy link
Author

rg2011 commented Sep 1, 2021

@rg2011 Thanks for your PR. #330 was a very old discussion and many things in the code and sentinel logic changed from that time. So:

* I'm not sure I can totally understand the changes to the sentinel logic but they affect multiple parts. Won't be possible to just allow setting MinSynchronousStandbys to 0 without doing any change to the sentinel logic and avoiding adding a new option (strictSyncRepl)?

Hi, thanks for reviewing a PR for such an old issue :D. The extra logic is mostly for avoiding keeping previous / unhealthy standbys in the list of syncStandbys, to speed up failover from sync to async. But honestly I have not tested what's the actual speedup, if any. so I'm ok with trying just MinSynchronousStandbys = 0 first, let me update this.

@sgotti sgotti added this to the v0.18.0 milestone Sep 3, 2021
@rg2011
Copy link
Author

rg2011 commented Sep 6, 2021

Hi @sgotti . Finished updating the PR and the tests passed, so I think it is ready for review.

@@ -24,7 +24,7 @@ Some options in a running cluster specification can be changed to update the des
| maxStandbysPerSender | max number of standbys for every sender. A sender can be a master or another standby (with cascading replication). | no | uint16 | 3 |
| maxStandbyLag | maximum lag (from the last reported master state, in bytes) that an asynchronous standby can have to be elected in place of a failed master. | no | uint32 | 1MiB |
| synchronousReplication | use synchronous replication between the master and its standbys | no | bool | false |
| minSynchronousStandbys | minimum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| minSynchronousStandbys | minimum number of required synchronous standbys when synchronous replication is enabled (only set this to a value != 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note about the behavior when the value is 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 21f5e31

@sgotti
Copy link
Member

sgotti commented Sep 8, 2021

@rg2011 Thanks! Great work. Just a little request on adding a note about minSynchronousStandbys == 0 in the doc. Can you please squash in a single commit?

@rg2011
Copy link
Author

rg2011 commented Sep 9, 2021

Squished into a single commit in #843. Please move discussion to that PR.

@rg2011 rg2011 closed this Sep 9, 2021
@sgotti sgotti reopened this Sep 9, 2021
@rg2011
Copy link
Author

rg2011 commented Sep 13, 2021

Learnt how to do the rebase on the same branch :) . Rebased to a single commit.

@sgotti
Copy link
Member

sgotti commented Sep 13, 2021

Learnt how to do the rebase on the same branch :) . Rebased to a single commit.

Thanks. After thinking a bit more about this, someone could want to have MinSynchronousStandbys to a number greater than one and disable sync repl when there're not enough ready standbys. So a strictSyncRepl parameter instead of accepting MinSynchronousStandbys == 0 could be better. I think we can do this without many changes to the sentinel logic like in the first PR iteration, just when choosing when to add a fake standby or not like in your last PR.

You should also rename the commit to something like:

*: Add option to disable sync replication when there're not enough ready standbys.

@rg2011
Copy link
Author

rg2011 commented Sep 13, 2021

Thanks. After thinking a bit more about this, someone could want to have MinSynchronousStandbys to a number greater than one and disable sync repl when there're not enough ready standbys. So a strictSyncRepl parameter instead of accepting MinSynchronousStandbys == 0 could be better. I think we can do this without many changes to the sentinel logic like in the first PR iteration, just when choosing when to add a fake standby or not like in your last PR.

You should also rename the commit to something like:

*: Add option to disable sync replication when there're not enough ready standbys.

I'm not sure about that use case. As long as there are ready replicas to add to the sync list, the sentinel will target for MaxSynchronousStandbys, it won't stop at MinSynchronousStandbys. So I don't see the benefit in setting MinSynchronousStandbys to anything higher than the absolute minimum number of sync replicas you will tolerate in a worst case scenario. Unless we change the sentinel logic, then a flag would make sense.

@sgotti
Copy link
Member

sgotti commented Sep 14, 2021

I'm not sure about that use case. As long as there are ready replicas to add to the sync list, the sentinel will target for MaxSynchronousStandbys, it won't stop at MinSynchronousStandbys. So I don't see the benefit in setting MinSynchronousStandbys to anything higher than the absolute minimum number of sync replicas you will tolerate in a worst case scenario. Unless we change the sentinel logic, then a flag would make sense.

Yeah, it's probably a not really common use case, so we can go ahead with the current approach and improve it in future if someone wants to implement it.

So just please add some doc on the behavior of MinSynchronousStandbys == 0 in the documentation and rename the commit to something like:

*: Add ability to disable sync replication when there're not enough ready standbys.

@sgotti sgotti changed the title Added support for "strictSyncRepl" cluster parameter Add ability to disable sync replication when there're not enough ready standbys. Sep 14, 2021
@rg2011
Copy link
Author

rg2011 commented Sep 14, 2021

So just please add some doc on the behavior of MinSynchronousStandbys == 0 in the documentation and rename the commit to something like:

*: Add ability to disable sync replication when there're not enough ready standbys.

Added a brief note about MinSynchronousStandbys == 0 in 21f5e31 . Didn't squash and rename the commit yet, in case you think we should add or remove something else from the doc.

@sgotti
Copy link
Member

sgotti commented Sep 15, 2021

@rg2011 Docs seems good, can you please also some brief note on the behavior of MinSynchronousStandbys == 0 in doc/cluster_spec.md and then squash and rename the commit? Thanks.

@rg2011 rg2011 requested a review from sgotti September 15, 2021 13:05
@rg2011
Copy link
Author

rg2011 commented Sep 15, 2021

Updated doc/cluster_spec.md and squashed. Glad to see this making it into milestone v0.18.0. Thanks!

Copy link
Member

@sgotti sgotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rg2011 Thanks a lot! LGTM. Merging.

@sgotti sgotti merged commit fc23394 into sorintlab:master Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants