-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve current capi cluster CP endpoint on updates #9267
Conversation
b0372e3
to
78b8c8b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9267 +/- ##
==========================================
+ Coverage 72.35% 72.42% +0.06%
==========================================
Files 587 589 +2
Lines 46140 46388 +248
==========================================
+ Hits 33385 33596 +211
- Misses 11006 11032 +26
- Partials 1749 1760 +11 ☔ View full report in Codecov by Sentry. |
78b8c8b
to
1436832
Compare
Testing: |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2ez4szliu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.21 |
@sp1999: new pull request created: #9274 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-0.20 |
@sp1999: new pull request created: #9275 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description of changes
It appears that our controller has always been emptying the capi cluster
ControlPlaneEndpoint
on every reconciliation. It seems to be a consequence of how server side apply works on top of go json marshaling of structs with zero values. Since the field is a struct a not a pointer, even when not set (zero value for all fields), it's being marshaled into a json object with fields set to the equivalent of the zero value of those types in go. We verified this by looking at the audit logs. In the request body from the eks-a controller we can see:And we do see how the in the response the api server has set these fields to empty. In fact we can see that the eks-a manager now becomes an owner of this field in the managed fields:
This hasn't been an issue until now, but only by luck. It's with the new version of CAPI that the KCP controller behavior has changed slightly due to a refactor in the code handling the status.
If the cluster ControlPlaneEndpoint is not set, the KCP assumes this is a "pre-creation" situation and skips most of the loop. As a consequence, the status of the KCP ends up looking weird: with zero replicas (remember kcp thinks this is a new cluster so it assumes 0 machines) and X available replica (this is calculated looking at the target worker nodes, which obviously exist). This make the unavailable replicas to take a negative value, which trips our pre-upgrade validations on the eks-a side.
Testing
@2ez4szliu ran some e2e tests manually
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.