OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

stratus-ss · 2025-01-28T16:39:11Z

SUMMARY

During an OpenShift installation, one of the checks to see that the cluster is ready to proceed with configuration is to check to ensure that the Cluster Operators are in an Available: True Degraded: False Progressing: False state. While you can currently use the k8s_info module to get a json response, the resulting json needs to be iterated over several times to get the appropriate status. This is convoluted and ugly trying to use Ansible's native abilities to parse and loop over JSON.

This feature should have the option to return the individual states of all operators as well as the overall state of the cluster in a true/false format. If all the operators are Available: True Degraded: False Progressing: False then the overall state of the cluster should be true

ISSUE TYPE

Feature Idea

COMPONENT NAME

This is probably best placed in the k8s_info module.

ADDITIONAL INFORMATION

The checking of the Cluster Operators is widely used to determine whether other automation against clusters should continue. It is also used for a broadly defined health check.

In python, a simple loop gets you the status of the individual operators. The json file below was gathered using the k8s_info module and dumped to disk for easier processing

import json

with open("cluster_operators_k8s.json") as f:
  d = json.load(f)


condition_list = ['Degraded', 'Available','Progressing']
for entry in d['resources']:
  for condition in entry['status']['conditions']:
    current_condition = condition['type']
    component_name = entry['metadata']['name']
    condition_bool = condition['status']
    if current_condition in condition_list:
        print(f"{component_name}: {current_condition} --> {condition_bool}")

The data structure of info is

current_condition = d['resources'][first_index]['status']['conditions'][second_index]['type']
component_name = d['resources'][first_index]['metadata']['name']
component_name = d['resources'][first_index]['metadata']['name']

The output obviously is just POC and looks like

console: Degraded --> False
console: Progressing --> False
console: Available --> True
control-plane-machine-set: Available --> True
control-plane-machine-set: Progressing --> False
control-plane-machine-set: Degraded --> False
csi-snapshot-controller: Degraded --> False
csi-snapshot-controller: Progressing --> False
csi-snapshot-controller: Available --> True
dns: Available --> True
dns: Progressing --> False
dns: Degraded --> False
etcd: Degraded --> False
etcd: Progressing --> False
etcd: Available --> True

I am happy to take on the work of a PR, if this type of feature would be acceptable

The text was updated successfully, but these errors were encountered:

stratus-ss · 2025-02-01T15:06:39Z

Upon reflection, as this module is meant to return only the rest object, does it make sense to have another module.

What I am asking for is the interpretation of the rest object, which, in my estimation should be logically in the k8s_info module but that would mean the expected return values would need to be expanded

fabianvf · 2025-02-03T20:16:56Z

I believe if it's a primarily openshift-specific feature, it would land in https://github.com/openshift/community.okd , however it could make sense to add a module/extend the k8s_info module to have more logic for handling the conditions since they are a pain to parse. I think if you wanted to go the community.okd route, you are pretty free to build whatever openshift-specific logic is the most helpful for the use case. For this repo it would need to be more generalized/have a use to the broader community as well.

gravesm · 2025-02-05T16:57:19Z

On further reflection and spending some time looking at this, I think the best thing may be to just add this functionality to this collection. It should be straightforward to do so. I believe all you will need to do is add a cluster_ready or cluster_operator_ready function (or whatever you want to call it) to this file:

kubernetes.core/plugins/module_utils/k8s/waiter.py

Line 26 in 1943dfc

def deployment_ready(deployment: ResourceInstance) -> bool:

You can see the other functions defined at the top here that are used to check ready state for specific resources. Define whatever logic you need there to check the various conditions. Then update the map with ClusterOperator and your new function here:

kubernetes.core/plugins/module_utils/k8s/waiter.py

Line 120 in 1943dfc

RESOURCE_PREDICATES = {

After that, you should be able to call k8s_info like:

    kind: ClusterOperator
    api_version: config.openshift.io/v1
    wait: true
    wait_timeout: 10

Can you also add a unit test for the new function? It can go in https://github.com/ansible-collections/kubernetes.core/blob/main/tests/unit/module_utils/test_waiter.py and you should be able to follow how it's done for the other resources.

The changes are isolated and the new logic mostly covered by unit tests so I think it's ok if there's no integration test for it.

stratus-ss · 2025-02-05T17:34:09Z

Can you also add a unit test for the new function? It can go in https://github.com/ansible-collections/kubernetes.core/blob/main/tests/unit/module_utils/test_waiter.py and you should be able to follow how it's done for the other resources.

The changes are isolated and the new logic mostly covered by unit tests so I think it's ok if there's no integration test for it.

thanks for the reply! I have tried to work through integration tests but I am not exactly sure how to mock out Cluster Operators in this situation. I have written some test integrations with my own openshift cluster but I am unsure how to make a test without a cluster to test against.

If I have a kubeconfig and a working cluster, the integration test will run as expected. However, what I am struggling with is the assumption that I don't have an OpenShift cluster to test against.

Normally I would have json files representing different states (one for both a healthy and an unhealthy cluster). However, I can't seem to dig up information on how to override a section of the module in order to reference the file instead of connecting to a cluster.

I suppose the question is, is this even a valid approach? I am pretty sure k8s does not have ClusterOperators and therefor a standard integration test is not likely to work. I see that waiter is using pytest, so I will look more into this before speaking too much out of turn

Thanks!

gravesm · 2025-02-06T15:19:36Z

You won't be able to write integration tests for this because, as you point out, k8s doesn't have ClusterOperators. We don't test this collection against openshift. In this case, I don't think we need an integration test, though, as what you're doing can be sufficiently covered by unit tests. You can look at

kubernetes.core/tests/unit/module_utils/test_waiter.py

Line 63 in 1943dfc

    
           @pytest.mark.parametrize("deployment,expected", zip(DEPLOYMENTS, [True, False]))

for an example of what you'll need to do. Add a new cluster operator fixture to https://github.com/ansible-collections/kubernetes.core/tree/main/tests/unit/module_utils/fixtures that contains one or more ClusterOperator states, set these up like how it has been done with Deployments in

kubernetes.core/tests/unit/module_utils/test_waiter.py

Line 31 in 1943dfc

DEPLOYMENTS = resources("fixtures/deployments.yml")

Then you can test whether your new function correctly returns true or false based on the state of the resource passed to it.

stratus-ss · 2025-02-11T00:09:57Z

Thanks for the pointers. I have been able to construct something fairly simple. I have general questions. given that the predicates return simply true or false, if the situation is false, how would you know what is failing?

As far as I can tell, you would catch the failure and then get the ClusterOperator object with the k8s_info module anyways and be back to parsing that output yourself.

Is this a correct statement? If it is, does that mean the more general question of what is failing falls outside the scope of this collection?

stratus-ss changed the title ~~OpenShift: Support Cluster Operator Info Gathering~~ OpenShift & k8s_info: Support Cluster Operator Info Gathering Jan 28, 2025

stratus-ss mentioned this issue Feb 3, 2025

OpenShif: Support Cluster Operator Info Gathering openshift/community.okd#251

Open

stratus-ss linked a pull request Feb 11, 2025 that will close this issue

waiter.py Add ClusterOperator Test #879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

stratus-ss commented Jan 28, 2025

stratus-ss commented Feb 1, 2025 •

edited

Loading

fabianvf commented Feb 3, 2025

gravesm commented Feb 5, 2025

stratus-ss commented Feb 5, 2025

gravesm commented Feb 6, 2025

stratus-ss commented Feb 11, 2025

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

OpenShift & k8s_info: Support Cluster Operator Info Gathering #869

Comments

stratus-ss commented Jan 28, 2025

SUMMARY

ISSUE TYPE

COMPONENT NAME

ADDITIONAL INFORMATION

stratus-ss commented Feb 1, 2025 • edited Loading

fabianvf commented Feb 3, 2025

gravesm commented Feb 5, 2025

stratus-ss commented Feb 5, 2025

gravesm commented Feb 6, 2025

stratus-ss commented Feb 11, 2025

stratus-ss commented Feb 1, 2025 •

edited

Loading