Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running oc login in paralel some other oc command is running breaking/corrupting kubeconfig file #11451

Open
petr-balogh opened this issue Feb 20, 2025 · 0 comments

Comments

@petr-balogh
Copy link
Member

E.g. with this PR:
#11249

We are trying to refresh connection to prometheus:

2025-02-19 23:38:34  17:38:34 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: ['oc', 'login', '-u', 'kubeadmin', '-p', '*****']
2025-02-19 23:38:35  17:38:35 - MainThread - ocs_ci.utility.utils - INFO  - Ceph cluster health is HEALTH_OK.
2025-02-19 23:38:35  17:38:35 - MainThread - tests.conftest - INFO  - Ceph health check passed at setup
2025-02-19 23:38:35  17:38:35 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: ['oc', 'login', '-u', 'kubeadmin', '-p', '*****']
2025-02-19 23:38:35  17:38:35 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token
2025-02-19 23:38:37  17:38:36 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token
2025-02-19 23:38:37  17:38:37 - Thread-2 - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring get Route prometheus-k8s -n openshift-monitoring -o yaml
2025-02-19 23:38:37  17:38:37 - MainThread - ocs_ci.utility.utils - WARNING  - Command stderr: error: no token is currently in use for this session
2025-02-19 23:38:37  
2025-02-19 23:38:37  17:38:37 - MainThread - tests.conftest - ERROR  - There was a problem with connecting to Prometheus
2025-02-19 23:38:37  Traceback (most recent call last):
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/tests/conftest.py", line 3871, in log_alerts
2025-02-19 23:38:37      prometheus = PrometheusAPI(threading_lock=threading_lock)
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/prometheus.py", line 367, in __init__
2025-02-19 23:38:37      self.refresh_connection()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/prometheus.py", line 392, in refresh_connection
2025-02-19 23:38:37      self._token = ocp.get_user_token()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/ocp.py", line 692, in get_user_token
2025-02-19 23:38:37      token = self.exec_oc_cmd(command, out_yaml_format=False).rstrip()
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/ocs/ocp.py", line 212, in exec_oc_cmd
2025-02-19 23:38:37      out = run_cmd(
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/utils.py", line 487, in run_cmd
2025-02-19 23:38:37      completed_process = exec_cmd(
2025-02-19 23:38:37    File "/home/jenkins/workspace/qe-deploy-ocs-cluster-prod/ocs-ci/ocs_ci/utility/utils.py", line 709, in exec_cmd
2025-02-19 23:38:37      raise CommandFailed(
2025-02-19 23:38:37  ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-monitoring whoami --show-token.
2025-02-19 23:38:37  Error is error: no token is currently in use for this session
2025-02-19 23:38:37  

Causing kubeconfig to be empty and we are seeing errors like:

2025-02-19 23:37:51  E               ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig get Node ip-10-0-5-140.us-east-2.compute.internal.
2025-02-19 23:37:51  E               Error is Error in configuration: 
2025-02-19 23:37:51  E               * context was not found for specified context: namespace-test-f8301eeb6b864c34b4bc7a2a1/api-j402ai3c33ua-qe-rh-ocs-com:6443/kube:admin
2025-02-19 23:37:51  E               * cluster has no server defined

We should come up with some mechanism to not running oc login when other oc command is running - using some locking mechanism - or using the backup kubeconfig file and using the different kubeconfig file for oc-login like for reason of re-freshing prometheus connection for io in bg.

For now we revert the PR here:
#11449
#11450
And other release branches...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant