Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an expected drift by scheduling.areRequirementsDrifted has not happened when adding a new requirements field. #1974

Open
flavono123 opened this issue Feb 7, 2025 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@flavono123
Copy link
Contributor

flavono123 commented Feb 7, 2025

Description

the orginal slack question: https://kubernetes.slack.com/archives/C02SFFZSA2K/p1738835029796219

i updated the nodepool with no topology.kubernetes.io/zone to with an In operator and values are ["us-east-1a", "us-east-1b", "us-east-1c"], and this is the only changes from the previous one.

there were the nodeclaims in "us-east-1f", so i guess the drift would occur by scheduling.areRequirementsDrifted but no.

Observed Behavior:

no drift

Expected Behavior:

a drift has occurred

afaik, the nodepool's hash would not be changed since i changed the special spec field, requirements:

Requirements []NodeSelectorRequirementWithMinValues `json:"requirements" hash:"ignore"`

so i focused on scheduling.areRequirementsDrifted but each step seems good, both custom tcs are passed:

         Context("Intersects", func() {
		It("should occur an error if the requirements do not intersect", func() {
			nodeClaimReq := NewLabelRequirements(map[string]string{
				"topology.kubernetes.io/zone": "us-east-1f",
			})

			nodepoolReq := NewNodeSelectorRequirementsWithMinValues(
				v1.NodeSelectorRequirementWithMinValues{
					NodeSelectorRequirement: corev1.NodeSelectorRequirement{
						Key: "topology.kubernetes.io/zone",
						Operator: corev1.NodeSelectorOpIn,
						Values: []string{"us-east-1a", "us-east-1b", "us-east-1c"},
					},
				},
			)

			Expect(nodeClaimReq.Intersects(nodepoolReq)).To(HaveOccurred())
		})
	})

	Context("Compatible", func() {
		It("should detect zone requirement drift", func() {
			nodeClaimReq := NewLabelRequirements(map[string]string{
				"topology.kubernetes.io/zone": "us-east-1f",
			})

			nodepoolReq := NewNodeSelectorRequirementsWithMinValues(
				v1.NodeSelectorRequirementWithMinValues{
					NodeSelectorRequirement: corev1.NodeSelectorRequirement{
						Key: "topology.kubernetes.io/zone",
						Operator: corev1.NodeSelectorOpIn,
						Values: []string{"us-east-1a", "us-east-1b", "us-east-1c"},
					},
				},
			)

			Expect(nodeClaimReq.Compatible(nodepoolReq)).NotTo(BeNil())
		})
	})

i think i investigated in a wrong point.

Reproduction Steps (Please include YAML):

the prev nodepool is following, actually, the nodeclaim in us-east-1f was provisioned by referenced ec2nodeclass's requirements, i mean its subnet selector has us-east-1f, not by nodepool's ones:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "10684252683046609404"
    karpenter.sh/nodepool-hash-version: v3
  name: mesh-1-31
spec:
  disruption:
    budgets:
      - nodes: "1"
      - nodes: "0"
        reasons:
          - Drifted
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: mainshard-1-31
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - c6in
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
+    - key: topology.kubernetes.io/zone
+          operator: In
+         values:
+            - us-east-1a
+            - us-east-1b
+            - us-east-1c
+        - key: kubernetes.

change to added above

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "10684252683046609404"
    karpenter.sh/nodepool-hash-version: v3
  name: mesh-1-31
spec:
  disruption:
    budgets:
      - nodes: "1"
      - nodes: "0"
        reasons:
          - Drifted
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: mainshard-1-31
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - c6in
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - us-east-1a
            - us-east-1b
            - us-east-1c
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand

there are NDB set but not relevant i guess.

Versions:

  • Chart Version: 1.0.0
  • Kubernetes Version (kubectl version):
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.31.4-eks-2d5f260
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@flavono123 flavono123 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 7, 2025
@jonathan-innis
Copy link
Member

Can you validate what the NodeClaim looks like if it is in us-east-1f -- does it contain the topology.kubernetes.io/zone key in the labels section? We use the NodeClaim to determine if requirements are drifted from our NodePools -- the primary reason for this was so that users could alter the Node under the hood (labels changed, etc.) without it being drifted -- though, I'm not sure that there is actually a strong case for this still existing.

In either case, if the NodeClaim doesn't have the label or requirement, then that check is going to succeed after the requirement update because the NodeClaim doesn't know which zone it's in

@jonathan-innis
Copy link
Member

/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 13, 2025
@flavono123
Copy link
Contributor Author

yes, i'm quite sure it had a label topology.kubernetes.io/zone:

  1. i didn't custom the key
  2. another NodeClaim in another NodePool has no requirements for topology.kubernetes.io/zone, has that label with the value following the az where it is, and also for a node(i can attach the instance id but not sure this is a proper way)

plus, the NodePool has zero budgets for Drift, but afaik, the node should be disrupted since the Drift is eventual disruption class right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants