running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

petr-balogh · 2025-02-20T14:21:23Z

E.g. with this PR:
#11249

We are trying to refresh connection to prometheus:

2025-02-19 23:38:34  17:38:34 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: ['oc', 'login', '-u', 'kubeadmin', '-p', '*****']
2025-02-19 23:38:35  17:38:35 - MainThread - ocs_ci.utility.utils - INFO  - Ceph cluster health is HEALTH_OK.
2025-02-19 23:38:35  17:38:35 - MainThread - tests.conftest - INFO  - Ceph health check passed at setup
2025-02-19 23:38:35  17:38:35 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: ['oc', 'login', '-u', 'kubeadmin', '-p', '*****']
2025-02-19 23:38:35  17:38:35 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token
2025-02-19 23:38:37  17:38:36 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token
2025-02-19 23:38:37  17:38:37 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring get Route prometheus-k8s -n openshift-monitoring -o yaml
2025-02-19 23:38:37  17:38:37 - MainThread - ocs_ci.utility.utils - WARNING  - Command stderr: error: no token is currently in use for this session
2025-02-19 23:38:37  
2025-02-19 23:38:37  17:38:37 - MainThread - tests.conftest - ERROR  - There was a problem with connecting to Prometheus
2025-02-19 23:38:37  Traceback (most recent call last):
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/tests/conftest.py", line 3871, in log_alerts
2025-02-19 23:38:37      prometheus = PrometheusAPI(threading_lock=threading_lock)
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/prometheus.py", line 367, in __init__
2025-02-19 23:38:37      self.refresh_connection()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/prometheus.py", line 392, in refresh_connection
2025-02-19 23:38:37      self._token = ocp.get_user_token()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/ocp.py", line 692, in get_user_token
2025-02-19 23:38:37      token = self.exec_oc_cmd(command, out_yaml_format=False).rstrip()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/ocp.py", line 212, in exec_oc_cmd
2025-02-19 23:38:37      out = run_cmd(
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/utils.py", line 487, in run_cmd
2025-02-19 23:38:37      completed_process = exec_cmd(
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/utils.py", line 709, in exec_cmd
2025-02-19 23:38:37      raise CommandFailed(
2025-02-19 23:38:37  ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token.
2025-02-19 23:38:37  Error is error: no token is currently in use for this session
2025-02-19 23:38:37

Causing kubeconfig to be empty and we are seeing errors like:

2025-02-19 23:37:51  E               ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig get Node ip-10-0-5-140.us-east-2.compute.internal.
2025-02-19 23:37:51  E               Error is Error in configuration: 
2025-02-19 23:37:51  E               * context was not found for specified context: namespace-test-f8301eeb6b864c34b4bc7a2a1/api-j402ai3c33ua-qe-rh-ocs-com:6443/kube:admin
2025-02-19 23:37:51  E               * cluster has no server defined

We should come up with some mechanism to not running oc login when other oc command is running - using some locking mechanism - or using the backup kubeconfig file and using the different kubeconfig file for oc-login like for reason of re-freshing prometheus connection for io in bg.

For now we revert the PR here:
#11449
#11450
And other release branches...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

petr-balogh commented Feb 20, 2025

running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

Comments

petr-balogh commented Feb 20, 2025