Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running code in leader election before starting any controllers #607

Open
alvaroaleman opened this issue Sep 17, 2019 · 18 comments
Open
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Milestone

Comments

@alvaroaleman
Copy link
Member

We have a couple of situations where we have functionality that needs to run on an elected leader only but before any of the controllers is started. One example is part of our configuration, which used to be passed in as file and is being migrated to a CRD. The migration code for this must run on an elected leader, but before any of our controllers are gets started.

Is this a usecase worth considering for c-r?

/cc @xrstf

@DirectXMan12
Copy link
Contributor

maybe? It's probably not terrible to support. To be clear with the use case:

Does this need to run every time something gets the leader, or is it a "only one thing needs to run this, and it needs to be run before all the things start?"

Running under the normal leader election before controllers would work fine for both use cases, but I'm trying to figure out how terrible it is to just run separately before anything else starts.

@alvaroaleman
Copy link
Member Author

Does this need to run every time something gets the leader, or is it a "only one thing needs to run this, and it needs to be run before all the things start?"

Not completely sure I properly understand the question. It should always run after acquiring the leader lock and before starting anything else.

Running under the normal leader election before controllers would work fine for both use cases, but I'm trying to figure out how terrible it is to just run separately before anything else starts.

I guess that would work, however last time I checked there was no stepping down for elected leaders which means that the controller has to wait $leaderLockLease time after the migrations ran before it can start the controllers.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2019
@gerred
Copy link
Contributor

gerred commented Jan 10, 2020

/assign gerred

@k8s-ci-robot
Copy link
Contributor

@gerred: GitHub didn't allow me to assign the following users: danderson.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign danderson

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gerred
Copy link
Contributor

gerred commented Jan 10, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2020
@danderson
Copy link

Adding a +1 to this, as discussed on Slack just now.

My use case is within MetalLB. Before MetalLB can start reconciling objects, it needs to build up some cluster-wide internal state (for example, which LoadBalancer IPs are currently assigned to various services).

The current implementation uses client-go and an internal wrapper library. In c-r terms: on startup, we generate reconcile callbacks for every object we're interested in, but the controller knows that the cache is not synced yet, so it just updates internal state and doesn't reconcile. When all the caches reach synced state, we fire a synthetic "synced" callback, which tells the rest of MetalLB to switch from "build internal state only" to "actively reconcile objects", and then we re-fire all the reconcile callbacks one more time to actually reconcile everything.

With c-r currently, the API guarantees that all caches are synced before the controller receives any reconcile callbacks. The API does not offer any place where I can do some preliminary work before entering the steady state of reconciling objects.

It's possible to implement what MetalLB needs, by keeping an internal initialized flag in the controller. The first time a reconcile callback is invoked, the reconcile code can see that initialization isn't done, dump all objects of interest out of the cache, and initialize at that point. This is fine, but it's a little bit ugly and makes the code less clear: there's no separation in code between one-time initialization and steady state.

It would be nice to have an optional Init callback on controllers, which gets invoked after all caches are synced, but before the controller is started. This would provide a clear point in the code where one-time initialization should happen, and keep that logic out of the reconcile function.

@gerred
Copy link
Contributor

gerred commented Jan 10, 2020

I'm 👍 to the callback route instead of a flag.

Some questions I have:

  • Is Init() intended to be idempotent on a controller restart?
  • Is Init() called again after the cache re-sync period? Relates to idempotency, but is subtly different.
  • What is the failure mode for these callbacks at the CR level? How does CR let operators/controllers manage callback errors?

@danderson
Copy link

  • Is Init() intended to be idempotent on a controller restart?

Assuming no change to the cluster state, I would say yes. However, I'm not sure there's value in requiring it? Maybe I don't understand what you mean by controller restart, do you mean something other than a whole-process restart?

  • Is Init() called again after the cache re-sync period? Relates to idempotency, but is subtly different.

I don't think so, Init is used to go from "I have no idea about the world" to "I have some mostly-updated idea about the world". From there, re-syncs (and any reconciles triggered by it) are just part of keeping up with changes to the world.

  • What is the failure mode for these callbacks at the CR level? How does CR let operators/controllers manage callback errors?

I'd be equally happy with two options: error != nil on Init is a fatal error (whole controller errors out), or behavior similar to reconcile, where the controller can request retries. I think the latter would be preferable, in that it allows the controller to fail due to transient errors external to k8s (e.g. an S3 controller failing because S3 is having an outage, but CR keeps retrying to init and eventually S3 is happy again).

@gerred
Copy link
Contributor

gerred commented Jan 10, 2020

Assuming no change to the cluster state, I would say yes. However, I'm not sure there's value in requiring it? Maybe I don't understand what you mean by controller restart, do you mean something other than a whole-process restart?

I mean a whole process restart - e.g. is there enough state stored in CRDs where we would expect idempotency. I guess that's up to the operator developer at that point and not the business of the callback mechanism itself.

@danderson
Copy link

For my particular use case right now, I'd expect idempotency. However, I can imagine controllers where that wouldn't be the case, this initialization may rely on external systems.

As a concrete example: a common request I get is to make MetalLB integrate with DHCP servers, to dynamically obtain/release IPs. If I were to support that, during Init I'd be poking the DHCP server for each IP listed as assigned in k8s, to check that the DHCP lease is still good. If we're talking "restart after kill -9", then the reconstructed state would probably end up being the same as the state we just lost... But it's dependent on the good behavior of a non-k8s system.

@alvaroaleman
Copy link
Member Author

@danderson So your issue is only about being notified about a synced cache before any reconciliation started? If I understand the issue correctly, this also implies that this "Init code" has to finish executing before reconciliation is attempted?

If the above is correct, would it be possible to use the APIReader within the Add function of your controller?

@vincepri
Copy link
Member

/kind design
/milestone Next
/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind design
/milestone Next
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/design Categorizes issue or PR as related to design. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Feb 21, 2020
@vincepri vincepri added this to the Next milestone Feb 21, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2020
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 21, 2020
@terinjokes
Copy link
Contributor

I have a similar use case where I need to build up an in-memory allocation map after being nominated as leader, but before running the reconcile loop. Is there an interest in adding a hook to the internal controller to support this use case?

terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 14, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's internal controller implementation. This hook runs
after the controller's caches have been synchronized, but before the
reconciliation workers have been started. The `PrestartHookable`
interface has been added to allow users to determine of hooks are
supported.

Fixes kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 14, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's internal controller implementation. This hook runs
after the controller's caches have been synchronized, but before the
reconciliation workers have been started. The `PrestartHookable`
interface has been added to allow users to determine of hooks are
supported.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 16, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 16, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 17, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Nov 17, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Dec 27, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Dec 27, 2022
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Feb 21, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
@cbroglie
Copy link

I have a use case where reconcilers for different resources depend on some common state which must be computed after leader election, but before reconciliation begins. I'm currently handling it by following the suggestion in #2044 (comment), and using NewUnmanaged to wrap each Start method with a call to initialize the common state if it hasn't already happened. It's working well enough, but landing #2044 would reduce the complexity.

terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Mar 14, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Apr 24, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jul 20, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jul 20, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jul 20, 2023
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jan 22, 2024
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jan 29, 2024
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Jan 29, 2024
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue May 6, 2024
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
terinjokes added a commit to terinjokes/controller-runtime that referenced this issue Aug 25, 2024
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.

This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.

Related kubernetes-sigs#607
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to design. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

9 participants