Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

Open
stratus-ss opened this issue Jan 28, 2025 · 6 comments · May be fixed by #879
Open

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

stratus-ss opened this issue Jan 28, 2025 · 6 comments · May be fixed by #879

Comments

@stratus-ss
Copy link

SUMMARY

During an OpenShift installation, one of the checks to see that the cluster is ready to proceed with configuration is to check to ensure that the Cluster Operators are in an Available: True Degraded: False Progressing: False state. While you can currently use the k8s_info module to get a json response, the resulting json needs to be iterated over several times to get the appropriate status. This is convoluted and ugly trying to use Ansible's native abilities to parse and loop over JSON.

This feature should have the option to return the individual states of all operators as well as the overall state of the cluster in a true/false format. If all the operators are Available: True Degraded: False Progressing: False then the overall state of the cluster should be true

ISSUE TYPE
  • Feature Idea
COMPONENT NAME

This is probably best placed in the k8s_info module.

ADDITIONAL INFORMATION

The checking of the Cluster Operators is widely used to determine whether other automation against clusters should continue. It is also used for a broadly defined health check.

In python, a simple loop gets you the status of the individual operators. The json file below was gathered using the k8s_info module and dumped to disk for easier processing

import json

with open("cluster_operators_k8s.json") as f:
  d = json.load(f)


condition_list = ['Degraded', 'Available','Progressing']
for entry in d['resources']:
  for condition in entry['status']['conditions']:
    current_condition = condition['type']
    component_name = entry['metadata']['name']
    condition_bool = condition['status']
    if current_condition in condition_list:
        print(f"{component_name}: {current_condition} --> {condition_bool}")

The data structure of info is

current_condition = d['resources'][first_index]['status']['conditions'][second_index]['type']
component_name = d['resources'][first_index]['metadata']['name']
component_name = d['resources'][first_index]['metadata']['name']

The output obviously is just POC and looks like

console: Degraded --> False
console: Progressing --> False
console: Available --> True
control-plane-machine-set: Available --> True
control-plane-machine-set: Progressing --> False
control-plane-machine-set: Degraded --> False
csi-snapshot-controller: Degraded --> False
csi-snapshot-controller: Progressing --> False
csi-snapshot-controller: Available --> True
dns: Available --> True
dns: Progressing --> False
dns: Degraded --> False
etcd: Degraded --> False
etcd: Progressing --> False
etcd: Available --> True

I am happy to take on the work of a PR, if this type of feature would be acceptable

@stratus-ss stratus-ss changed the title OpenShift: Support Cluster Operator Info Gathering OpenShift & k8s_info: Support Cluster Operator Info Gathering Jan 28, 2025
@stratus-ss
Copy link
Author

stratus-ss commented Feb 1, 2025

Upon reflection, as this module is meant to return only the rest object, does it make sense to have another module.

What I am asking for is the interpretation of the rest object, which, in my estimation should be logically in the k8s_info module but that would mean the expected return values would need to be expanded

@fabianvf
Copy link
Contributor

fabianvf commented Feb 3, 2025

I believe if it's a primarily openshift-specific feature, it would land in https://github.com/openshift/community.okd , however it could make sense to add a module/extend the k8s_info module to have more logic for handling the conditions since they are a pain to parse. I think if you wanted to go the community.okd route, you are pretty free to build whatever openshift-specific logic is the most helpful for the use case. For this repo it would need to be more generalized/have a use to the broader community as well.

@gravesm
Copy link
Member

gravesm commented Feb 5, 2025

On further reflection and spending some time looking at this, I think the best thing may be to just add this functionality to this collection. It should be straightforward to do so. I believe all you will need to do is add a cluster_ready or cluster_operator_ready function (or whatever you want to call it) to this file:

def deployment_ready(deployment: ResourceInstance) -> bool:
You can see the other functions defined at the top here that are used to check ready state for specific resources. Define whatever logic you need there to check the various conditions. Then update the map with ClusterOperator and your new function here:
RESOURCE_PREDICATES = {

After that, you should be able to call k8s_info like:

    kind: ClusterOperator
    api_version: config.openshift.io/v1
    wait: true
    wait_timeout: 10

Can you also add a unit test for the new function? It can go in https://github.com/ansible-collections/kubernetes.core/blob/main/tests/unit/module_utils/test_waiter.py and you should be able to follow how it's done for the other resources.

The changes are isolated and the new logic mostly covered by unit tests so I think it's ok if there's no integration test for it.

@stratus-ss
Copy link
Author

Can you also add a unit test for the new function? It can go in https://github.com/ansible-collections/kubernetes.core/blob/main/tests/unit/module_utils/test_waiter.py and you should be able to follow how it's done for the other resources.

The changes are isolated and the new logic mostly covered by unit tests so I think it's ok if there's no integration test for it.

thanks for the reply! I have tried to work through integration tests but I am not exactly sure how to mock out Cluster Operators in this situation. I have written some test integrations with my own openshift cluster but I am unsure how to make a test without a cluster to test against.

If I have a kubeconfig and a working cluster, the integration test will run as expected. However, what I am struggling with is the assumption that I don't have an OpenShift cluster to test against.

Normally I would have json files representing different states (one for both a healthy and an unhealthy cluster). However, I can't seem to dig up information on how to override a section of the module in order to reference the file instead of connecting to a cluster.

I suppose the question is, is this even a valid approach? I am pretty sure k8s does not have ClusterOperators and therefor a standard integration test is not likely to work. I see that waiter is using pytest, so I will look more into this before speaking too much out of turn

Thanks!

@gravesm
Copy link
Member

gravesm commented Feb 6, 2025

You won't be able to write integration tests for this because, as you point out, k8s doesn't have ClusterOperators. We don't test this collection against openshift. In this case, I don't think we need an integration test, though, as what you're doing can be sufficiently covered by unit tests. You can look at

@pytest.mark.parametrize("deployment,expected", zip(DEPLOYMENTS, [True, False]))
for an example of what you'll need to do. Add a new cluster operator fixture to https://github.com/ansible-collections/kubernetes.core/tree/main/tests/unit/module_utils/fixtures that contains one or more ClusterOperator states, set these up like how it has been done with Deployments in
DEPLOYMENTS = resources("fixtures/deployments.yml")
Then you can test whether your new function correctly returns true or false based on the state of the resource passed to it.

@stratus-ss
Copy link
Author

Thanks for the pointers. I have been able to construct something fairly simple. I have general questions. given that the predicates return simply true or false, if the situation is false, how would you know what is failing?

As far as I can tell, you would catch the failure and then get the ClusterOperator object with the k8s_info module anyways and be back to parsing that output yourself.

Is this a correct statement? If it is, does that mean the more general question of what is failing falls outside the scope of this collection?

@stratus-ss stratus-ss linked a pull request Feb 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants