Remove taint node.ocs.openshift.io/storage=true:NoSchedule from nodes during teardown of test_non_ocs_taint_and_tolerations #11456

am-agrawa · 2025-02-20T20:20:10Z

All nodes are tainted by node.ocs.openshift.io/storage=true:NoSchedule during test_non_ocs_taint_and_tolerations hence the workload pod couldn't reach Running state in the following test. We should remove this taint during teardown of the test so that subsequent failures can be avoided.

E.g.
In the same run, test_rolling_shutdown_and_recovery failed:

2025-01-04 09:30:26  23:00:25 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n namespace-test-da4558c57f524a4fbef3c6e17 get Pod pod-test-rbd-10940c87cd414a149be90cbba73 -n namespace-test-da4558c57f524a4fbef3c6e17
2025-01-04 09:30:26  23:00:25 - MainThread - ocs_ci.ocs.ocp - INFO  - status of pod-test-rbd-10940c87cd414a149be90cbba73 at column STATUS was Pending, but we were waiting for Running
2025-01-04 09:30:26  23:00:25 - MainThread - ocs_ci.ocs.ocp - ERROR  - timeout expired: Timed out after 300s running get("pod-test-rbd-10940c87cd414a149be90cbba73", True, None)
2025-01-04 09:30:26  23:00:25 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc -n namespace-test-da4558c57f524a4fbef3c6e17 describe Pod pod-test-rbd-10940c87cd414a149be90cbba73
2025-01-04 09:30:26  23:00:26 - MainThread - ocs_ci.ocs.ocp - WARNING  - Description of the resource(s) we were waiting for:
2025-01-04 09:30:26  Name:             pod-test-rbd-10940c87cd414a149be90cbba73
2025-01-04 09:30:26  Namespace:        namespace-test-da4558c57f524a4fbef3c6e17
2025-01-04 09:30:26  Priority:         0
2025-01-04 09:30:26  Service Account:  default
2025-01-04 09:30:26  Node:             <none>
2025-01-04 09:30:26  Labels:           <none>
2025-01-04 09:30:26  Annotations:      openshift.io/scc: anyuid
2025-01-04 09:30:26  Status:           Pending
2025-01-04 09:30:26  IP:               
2025-01-04 09:30:26  IPs:              <none>
2025-01-04 09:30:26  Containers:
2025-01-04 09:30:26    web-server:
2025-01-04 09:30:26      Image:        quay.io/ocsci/nginx:fio
2025-01-04 09:30:26      Port:         <none>
2025-01-04 09:30:26      Host Port:    <none>
2025-01-04 09:30:26      Environment:  <none>
2025-01-04 09:30:26      Mounts:
2025-01-04 09:30:26        /var/lib/www/html from mypvc (rw)
2025-01-04 09:30:26        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9qgzm (ro)
2025-01-04 09:30:26  Conditions:
2025-01-04 09:30:26    Type           Status
2025-01-04 09:30:26    PodScheduled   False 
2025-01-04 09:30:26  Volumes:
2025-01-04 09:30:26    mypvc:
2025-01-04 09:30:26      Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
2025-01-04 09:30:26      ClaimName:  pvc-test-8c43d351e6364cd8a28cce5078383c6
2025-01-04 09:30:26      ReadOnly:   false
2025-01-04 09:30:26    kube-api-access-9qgzm:
2025-01-04 09:30:26      Type:                    Projected (a volume that contains injected data from multiple sources)
2025-01-04 09:30:26      TokenExpirationSeconds:  3607
2025-01-04 09:30:26      ConfigMapName:           kube-root-ca.crt
2025-01-04 09:30:26      ConfigMapOptional:       <nil>
2025-01-04 09:30:26      DownwardAPI:             true
2025-01-04 09:30:26      ConfigMapName:           openshift-service-ca.crt
2025-01-04 09:30:26      ConfigMapOptional:       <nil>
2025-01-04 09:30:26  QoS Class:                   BestEffort
2025-01-04 09:30:26  Node-Selectors:              <none>
2025-01-04 09:30:26  Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
2025-01-04 09:30:26                               node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
2025-01-04 09:30:26  Events:
2025-01-04 09:30:26    Type     Reason            Age   From               Message
2025-01-04 09:30:26    ----     ------            ----  ----               -------
2025-01-04 09:30:26    Warning  FailedScheduling  5m1s  default-scheduler  0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.ocs.openshift.io/storage: true}. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

The failure occurred on ODF 4.18.0-121

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove taint node.ocs.openshift.io/storage=true:NoSchedule from nodes during teardown of test_non_ocs_taint_and_tolerations #11456

Remove taint node.ocs.openshift.io/storage=true:NoSchedule from nodes during teardown of test_non_ocs_taint_and_tolerations #11456

am-agrawa commented Feb 20, 2025

Remove taint node.ocs.openshift.io/storage=true:NoSchedule from nodes during teardown of test_non_ocs_taint_and_tolerations #11456

Remove taint node.ocs.openshift.io/storage=true:NoSchedule from nodes during teardown of test_non_ocs_taint_and_tolerations #11456

Comments

am-agrawa commented Feb 20, 2025