Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry unsuccessful failover on unschedulable nodes #2749

Open
simonklb opened this issue Sep 4, 2024 · 0 comments
Open

Retry unsuccessful failover on unschedulable nodes #2749

simonklb opened this issue Sep 4, 2024 · 0 comments

Comments

@simonklb
Copy link

simonklb commented Sep 4, 2024

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.12.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes
  • Are you running Postgres Operator in production? yes
  • Type of issue? Feature request

When draining a node where the leader is running before any replica has become ready the failover will not succeed. That is good. However, if the replica then becomes ready the failover is never retried and you have to uncordon and redo the drain for it to succeed.

I believe the relevant part is here:

// do nothing if the node should have already triggered an update or
// if only one of the label and the unschedulability criteria are met.
if !c.nodeIsReady(nodePrev) || c.nodeIsReady(nodeCur) {
return
}

Would you be open to change this behavior? Is the harm in letting the failover retry if it the node is still not ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant