Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stolonctl: implement postgres reload/restart #88

Closed
sgotti opened this issue Oct 30, 2015 · 4 comments · Fixed by #568
Closed

stolonctl: implement postgres reload/restart #88

sgotti opened this issue Oct 30, 2015 · 4 comments · Fixed by #568

Comments

@sgotti
Copy link
Member

sgotti commented Oct 30, 2015

No description provided.

@sgotti
Copy link
Member Author

sgotti commented May 11, 2018

Closing since it's old and without any context.

@sgotti sgotti closed this as completed May 11, 2018
@prabhu43
Copy link
Contributor

prabhu43 commented Sep 4, 2018

Hi @sgotti

We need this feature of postgres restart to take effect of updated pgParameters like max_connections.

I went through this issue (#255) and got an idea of restarting the postgres instance of slave keepers first and then restarting the postgres instance of master. On restarting the postgres of master, sentinal will elect another healthy keeper as master

We thought of implementing this feature as follows:

  1. Introduce a flag PgRestart in keeper status in cluster data

  2. On executing stolonctl pgRestart, the following should happen (done by the CLI stolonctl)
    a. Update Cluster Data: Set .Keepers[keeperId].Status.PgRestart: true for slave keepers
    b. This would restart Postgres of all slaves with updated ClusterSpecification as in ClusterData.
    c. CLI will wait for atleast one slave to be restarted and marked DBs.Healthy and Keeper.Healthy with a defined timeout
    d. If atleast one slave is restarted and healthy, restart postgres of master keeper:

    • Check if replication is in sync.
    • ForceFail existing master Keeper. This would trigger a re-election.
    • Update Cluster Data: Set .Keepers[keeperId].Status.PgRestart: true for master Keeper
      e. If none of the slave keepers are healthy (within defined timeout), exit with non-zero status.
  3. The following should happen on Keepers:
    If Status.PgRestart is true,

    • Set Status.PgRestart is false,
    • Restart postgres instance

This would avoid downtime as well. But there are few gotchas

  1. There are few parameters in Postgres if changed on slave before on master, the slave wil not restart (limitations of hot_standby). For eg. decreasing max_connections. In this case, always 2.e will happen. For this, we can provide --no-wait option for the pgRestart command through master database will be restarted without waiting for slaves to be healthy.

Any thoughts on this?

@sgotti
Copy link
Member Author

sgotti commented Sep 7, 2018

@prabhu43 I reopened the issue (it was closed and I was losing your comment).

Your proposal is basically an operator that handles a long running transaction. It requires a lot of logic to handle all the possible failures (it implements one of the possible workflows but there can be others). And as you said there're different gotchas that are difficult to know ahead of time since you should reimplement all the postgresql parameter checking logic.

For this reason I won't add a command like stolonctl pgrestart that will be a blocking command carrying all of this logic.
Perhaps this could be implemented (in go or as shell script or whatever) as a contrib script/tool outside stolonctl.

Some notes

  • ideally a keeper can automatically discover (and report) if an instance needs a restart querying the pg_parameters table. It's simply not yet implemented.

On restarting the postgres of master, sentinal will elect another healthy keeper as master

To be precise, restarting the master keeper doesn't imply that a new primary will be elected, if the restart is fast enough (usually) and doesn't fail due to wrong parameters the sentinel won't detect the master as failed and won't elect a new master.

@sgotti sgotti reopened this Sep 7, 2018
@aswinkarthik
Copy link
Contributor

aswinkarthik commented Sep 7, 2018

if the restart is fast enough (usually) and doesn't fail due to wrong parameters the sentinel won't detect the master as failed and won't elect a new master.

@prabhu43 and myself tried exactly this in this PR #561

We just blindy restart all 3 postgres and tested it out. It restarted very fast with very minimal downtime (lesser than 1 second) but it was triggered from a CLI command stolonctl pgrestart. We could also make the changes such that stolon-keeper itself can decide if a restart is needed and it will restart pg if necessary. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants