Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid flake unit tests #164

Merged
merged 2 commits into from
Mar 2, 2022
Merged

Avoid flake unit tests #164

merged 2 commits into from
Mar 2, 2022

Conversation

zeeke
Copy link
Contributor

@zeeke zeeke commented Feb 16, 2022

This to fix the flakiness of some unit test, as happened in #163

@zeeke zeeke force-pushed the log-level-flaky-test branch 5 times, most recently from 8f07b5c to 7b3c6cb Compare February 21, 2022 10:15
@zeeke
Copy link
Contributor Author

zeeke commented Feb 21, 2022

Added some debug log and discovered that during the cleanup phase the controller continue to reconcile the resource:

...
•2022/02/21 10:26:15 delete AddressPool
2022/02/21 10:26:15 delete BGPPeer
2022/02/21 10:26:15 delete BFDProfile
2022/02/21 10:26:15 delete ConfigMap
2022/02/21 10:26:15 reconciling (policy/v1beta1, Kind=PodSecurityPolicy) /controller
2022/02/21 10:26:15 update was successful
2022/02/21 10:26:15 reconciling (policy/v1beta1, Kind=PodSecurityPolicy) /speaker
2022/02/21 10:26:15 delete MetalLB
2022/02/21 10:26:15 update was successful
2022/02/21 10:26:15 reconciling (apps/v1, Kind=DaemonSet) metallb-test-namespace/speaker
2022/02/21 10:26:15 delete Deployment
2022/02/21 10:26:15 does not exist, creating (apps/v1, Kind=DaemonSet) metallb-test-namespace/speaker
2022/02/21 10:26:15 delete DaemonSet
2022/02/21 10:26:15 successfully created (apps/v1, Kind=DaemonSet) metallb-test-namespace/speaker
2022/02/21 10:26:15 update was successful
2022/02/21 10:26:15 reconciling (apps/v1, Kind=Deployment) metallb-test-namespace/controller
2022/02/21 10:26:15 does not exist, creating (apps/v1, Kind=Deployment) metallb-test-namespace/controller
2022/02/21 10:26:15 successfully created (apps/v1, Kind=Deployment) metallb-test-namespace/controller
2022/02/21 10:26:15 update was successful
2022/02/21 10:26:16 cleanTestNamespace end

I think we should look for a better way to clean test resources.

Is there a way to cleanly remove MetalLB from a cluster using only API resource?

return err
}
// Wait for the deletion queue to be empty, as the delete method is asynchronous in k8s APIs.
err = retry.Do(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is normally done with gomega's Eventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but this function does not have Ginkgo assertion yet, so I'm going to return error without retries and put the eventually in caller side. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

@fedepaol
Copy link
Member

I think we should look for a better way to clean test resources.

Is there a way to cleanly remove MetalLB from a cluster using only API resource?

It would make sense for the controller to keep reconciling, if we did not have removed the MetalLB resource.
In that case, when we delete a "child" resource, the operator will reconcile and recerate it (pretty much what's happening in your logs). What I don't get is the fact that we deleted MetalLB, so ideally we should not have the owner anymore.
On top of that, because of the OwnerReference we set when we create the child resources, they should be implicitly deleted as soon as we zap the metallb resource, so I think checking that is probably our main suspect.

@zeeke
Copy link
Contributor Author

zeeke commented Feb 23, 2022

As far as I discovered, the envtest package does not include built-in controllers

https://book.kubebuilder.io/reference/envtest.html#testing-considerations

So objects need to be deleted manually, unless implementing some workaround like this

Anyway, I need to dig deeper to understand the test flakyness and I'm going to mark this PR as a draft

@zeeke zeeke marked this pull request as draft February 23, 2022 14:50
@zeeke zeeke force-pushed the log-level-flaky-test branch 2 times, most recently from 8ae38cf to 035816a Compare February 25, 2022 17:03
@zeeke
Copy link
Contributor Author

zeeke commented Feb 25, 2022

Digging deeper I discovered that MetalLB and ConfigMap reconcilers continue working during resource cleanup and there is no easy way to synchronize test code with production code, unless introducing a sort of "reconciler state" and exposing it to test.

So, the safest way to avoid flakes is to wrap assertions in Eventually(...) calls and wait for the resource to reach the desired state.

Setting environment variable in test causes tests
to be order dependent, since SPEKAER_IMAGE,
CONTROLLER_IMAGE, FRR_IMAGE and KUBE_RBAC_PROXY_IMAGE are mandatory
for the controller to work without errors.

Avoid set WATCH_NAMESPACE environment variable as
it's not used during tests.

Signed-off-by: Andrea Panattoni <[email protected]>
MetalLBReconciler and ConfigMapReconciler continue
reconciling even during the AfterEach phase of tests.
Since test code can't know if a reconciler is still working,
there is no way to safely and completely delete all resource
during cleanup. So it's better to wrap assertion in `Eventually()` call.

Signed-off-by: Andrea Panattoni <[email protected]>
@zeeke zeeke force-pushed the log-level-flaky-test branch from 035816a to 38ab6c9 Compare February 25, 2022 17:05
@zeeke
Copy link
Contributor Author

zeeke commented Feb 25, 2022

Digging deeper I discovered that MetalLB and ConfigMap reconciliers continue working during resource cleanup and there is no
easy way to synchronize test code with production code, unless introducing a sort of "reconcilier state" and exposing it to test.
So. the safest way to avoid flakes is to wrap assertions in Eventually(...) calls and wait for the resource to reach the desired state.

@zeeke zeeke marked this pull request as ready for review February 28, 2022 07:45
@pperiyasamy
Copy link
Contributor

another instance of same CI flake with this run https://github.com/metallb/metallb-operator/runs/5378870296?check_suite_focus=true

@fedepaol
Copy link
Member

fedepaol commented Mar 2, 2022

LGTM, thanks!

@fedepaol fedepaol merged commit cfd950d into metallb:main Mar 2, 2022
zeeke added a commit to zeeke/metallb-operator that referenced this pull request Mar 2, 2022
* Set environment variables during suite setup

Setting environment variable in test causes tests
to be order dependent, since SPEKAER_IMAGE,
CONTROLLER_IMAGE, FRR_IMAGE and KUBE_RBAC_PROXY_IMAGE are mandatory
for the controller to work without errors.

Avoid set WATCH_NAMESPACE environment variable as
it's not used during tests.

Signed-off-by: Andrea Panattoni <[email protected]>

* Retry asserts in controller tests

MetalLBReconciler and ConfigMapReconciler continue
reconciling even during the AfterEach phase of tests.
Since test code can't know if a reconciler is still working,
there is no way to safely and completely delete all resource
during cleanup. So it's better to wrap assertion in `Eventually()` call.

Signed-off-by: Andrea Panattoni <[email protected]>
(cherry picked from commit cfd950d)
fedepaol pushed a commit to fedepaol/metallb-operator that referenced this pull request Oct 20, 2023
OCPBUGS-14503: [4.10] Change the memberlist secret to contain path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants