DaemonSet pods circle in FAILED state due to rescheduling to node which is terminated #2009
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-priority
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
Observed Behavior:
DaemonSet(DS) pods circle in
Failed
->Pending
state on node which is consolidate and terminated later.DS pods (like aws-node, kube-proxy) usually tolerates all Taints. Karpenter put Taint
karpenter.sh/disrupted:NoSchedule
on node, now DS pods get intoFailed
state and new pods in Pending state appear, extranot-ready
taints are also added so newly scheduled pods are rejected. Repeats until node is terminated and removed from cluster.When large amount of nodes is rotated in same time, our alerting is fired with false positive alerts about multiple pods swapping between Failed/Pending state.
See attached log files and screenshot.
eks-controller-logs.csv
karpenter-controller.log
We observer same issue on older versions, but not on versions 0.37 and below.
Expected Behavior:
We do not observe flipping state of DaemonSet pod between Failure/Pending during node consolidation/termination.
Reproduction Steps (Please include YAML):
Simply consolidate some node controlled by Karpenter and watch k8s cluster Events or EKS controller logs. You will see karpenter remove all pods from node expect DS + put there
karpenter
taint, DS pods get into Failed state -> transition to Pending state,not-ready
taints are added to node, after ~ 1 minute node is terminated.Versions:
The text was updated successfully, but these errors were encountered: