Tide: Use an index on the lister to filter prowjobs for subpool #14830

alvaroaleman · 2019-10-17T13:23:14Z

Because of #14798 this PR aims at reducing the CPU and memory consumption of tide by indexing Presubmit/Batch prowjobs by baseSHA and then doing a list per subpool rather than a list of everything.

/assign @stevekuznetsov

alvaroaleman · 2019-10-17T13:35:48Z

The main drawback of this PR is that we do not have unittests for "do we actually add the index to the cache" anymore, so if someone removed that, tests wont notice but Tide will not work anymore.

The only way I see to test that is by using something like https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest for spinning up an actual apiserver during tests.

stevekuznetsov

I would have expected to see the fake client use the index as well in tests?

alvaroaleman · 2019-10-17T18:35:11Z

I would have expected to see the fake client use the index as well in tests?

The index is not in the client, its in the cache. The production client uses the cache for all read operations, thats why it can be used there.

alvaroaleman · 2019-10-17T18:49:08Z

If we want a test where the index is being used, there is no way around spinning up an actual apiserver. Which is reasonable easy via the envtest package.

fejta · 2019-10-17T20:41:17Z

/uncc
/cc @cjwagner

cjwagner

If we want a test where the index is being used, there is no way around spinning up an actual apiserver. Which is reasonable easy via the envtest package.

Since this is critical and might be adjusted later this is definitely something we should be testing. Can/should we use envtest from golang unit tests or do we need to use a separate integration test job.

prow/tide/tide.go

cjwagner · 2019-10-17T21:22:02Z

prow/tide/tide.go

 		}
-		sps[fn].pjs = append(sps[fn].pjs, pj)
+		c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))
+		sps[subpoolkey].pjs = pjs.Items


The fact that the base SHA matches is not enough to know that this PJ matches the tide pool. It is possible for the same base sha to exist in different org/repo. More realistically, it is also possible for two different branches (BaseRef) to point to the same commit and we need to distinguish the different pools in this case.

We don't necessarily need to make the index more complicated, but if we keep it as is we need to filter out PJs for other subpools like we did here before.

Do we have unit tests for these scenarios? That's the best way to ensure it remains invariant.

Alvaro added unit tests, but they require the envtest package I asked about here: #14830 (review)

(Also if this is what we want to do, do you have any recommendations on the best way to do this with bazel?)

Can we add actual unit tests? I don't see any reason why we would need etcd in order to validate the behavior of

the same SHA existing in multiple repos

two branches pointing to the same commit

AKA ensuring

commit C repo R, branch B => index i1

commit C, repo S, branch B => index i2

commit C, repo R, branch D => index i3
(or at least that none of these equals the other) seems sufficient here

This seems more useful/important than validating the internal behavior of c.prowJobClient.List() (that would be more an integration test than unit test)

We validate that "input X return exactly Y". This is different than "ensure input X and X' return whatever values so long as they differ from each other", right?

Added a TestCacheIndexFuncReturnsDifferentResultsForDifferentInputs that verifies that.

Generally, using a cache with an index is probably the most efficient solution for filtering kube api objects repeatedly, so it would be great if we can find a way to test that that is agreeable for everyone :) How do you feel about using envtest for that @fejta ?

Faking the interface seems like the most appropriate strategy here.

Making everything depend on making real calls to the apiserver is an inefficient test strategy. We're just consumers of all this code. Why can't we just assume kubernetes works correctly?

We do not, for example, validate the the Google cloud storage client transfers files correctly. We just assume it works and fake the return values. This is efficient and hasn't lead to any regressions.

I would recommend this faking strategy here over replacing unit testing with integration testing.

Okay. I've also opened kubernetes-sigs/controller-runtime#657 for this, because ideally we don't have to build fakes downstream.

While a test for "do we actually add the index to the cache" definitely makes sense, adding a fake client that uses an index pretty much only tests our dependencies (that do have tests for this) and our fake implementation. For the sake of getting this done, I'll add it nonetheless.

I agree that is probably more work than is justified? This is the call to test:

err := c.prowJobClient.List( c.ctx, pjs, ctrlruntimeclient.MatchingField(cacheIndexName, cacheIndexKey(sp.org, sp.repo, sp.branch, sp.sha)), ctrlruntimeclient.InNamespace(c.config().ProwJobNamespace))

So wrap it in a function:

type lister interface { List(context.Context, *prowapi.ProwJobList, matchingFieldRetValue, namespaceRetValue) error } func listMatchingJobs(ctx context.Context, prowJobClient lister, sp subpool, namespace string) (*prowapi.ProwJobList, error) { var pjs prowapi.ProwJobList if err := prowJobClient.List(ctx, &pjs, MatchingField(cacheIndexName, cacheIndexKey(sp.org, sp.repo, sp.branch, sp.sha)), InNamespace(namespace); err != nil { return nil, err } return &pjs

Now write a fakeLister and validate that we send cacheIndexName, cacheIndexKey and namespace correctly.

Change all the list calls to this helper function.

Done.

I don't think creating a fake client that works correctly is super necessary here.

Well I ended up writing a small implementation of the subset of the manager we need and something that just embedds a an upstream fakeclient and wraps its List call to use an index func if requested.

I think this should be good now, PTAL.

alvaroaleman · 2019-10-21T07:21:40Z

Updated the PR to also considers org, repo and branch in the repo and use envtest for testing, which allowed me to restore TestDividePool.

It won't pass in CI thought until there is an etcd and kube-apiserver binary available. How can I build an image for CI?

Can/should we use envtest from golang unit tests or do we need to use a separate integration test job

I don't care much either way, we only have to keep in mind that envtest or more specifically spinning up etcd and kube-apiserver is pretty sensitive to CPU starvation.

cjwagner

It won't pass in CI thought until there is an etcd and kube-apiserver binary available. How can I build an image for CI?

I think we'd want to rely on bazel instead of baking dependencies into a test image. Do you know how to make that work?
@fejta @stevekuznetsov WDYT about using envtest to spin up etcd and kube-apiserver in our unit tests? Is this reasonable or should we use a separate ProwJob (or use a different testing pattern altogether)?

prow/cmd/tide/main.go

cjwagner · 2019-10-21T21:13:42Z

prow/tide/tide.go

 		}
-		sps[fn].pjs = append(sps[fn].pjs, pj)
+		c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))


nit: Debugf. Also might want to mention that this is the number of prowjobs found for the subpool before filtering.

Which filtering are you referring to by "before filtering"? They are already filtered by org/repo/branch+baseSHA

alvaroaleman · 2019-10-21T23:04:18Z

think we'd want to rely on bazel instead of baking dependencies into a test image. Do you know how to make that work?

Unfortunately not, do you have any kind of reference/sample for "Use bazel to download and provide binaries"?

fejta

LGTM

fejta · 2019-10-21T23:08:58Z

prow/tide/tide.go

 		}
-		sps[fn].pjs = append(sps[fn].pjs, pj)
+		c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))
+		sps[subpoolkey].pjs = pjs.Items


Do we have unit tests for these scenarios? That's the best way to ensure it remains invariant.

fejta

/hold

fejta · 2019-10-23T21:09:25Z

prow/tide/tide_test.go

+				BaseRef: sp.branch,
+				BaseSHA: sp.sha,
+			}
+			if diff := deep.Equal(pj.Spec.Refs, referenceRef); diff != nil {


Does reflect.DeepEqual not work here?

It does work, but deep provides a very easy to read diff which is super helpful when debugging issues with the tests

cjwagner

/hold

k8s-ci-robot · 2019-10-25T21:01:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, cjwagner, fejta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cjwagner,fejta]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cjwagner · 2019-10-25T21:02:04Z

🤦‍♂️ I meant to
/hold cancel

k8s-ci-robot assigned stevekuznetsov Oct 17, 2019

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2019

k8s-ci-robot requested review from clarketm and fejta October 17, 2019 13:24

k8s-ci-robot added area/prow Issues or PRs related to prow area/prow/tide Issues or PRs related to prow's tide component sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 17, 2019

stevekuznetsov reviewed Oct 17, 2019

View reviewed changes

k8s-ci-robot requested review from cjwagner and removed request for fejta October 17, 2019 20:41

cjwagner reviewed Oct 17, 2019

View reviewed changes

alvaroaleman force-pushed the tide-index-prowjobs branch 2 times, most recently from b2de716 to 5485dd0 Compare October 20, 2019 18:02

cjwagner reviewed Oct 21, 2019

View reviewed changes

fejta reviewed Oct 21, 2019

View reviewed changes

alvaroaleman force-pushed the tide-index-prowjobs branch 2 times, most recently from fbec38a to e924d72 Compare October 23, 2019 08:26

alvaroaleman mentioned this pull request Oct 23, 2019

Allow unittesting indexes on the cache-backed reader kubernetes-sigs/controller-runtime#657

Open

alvaroaleman force-pushed the tide-index-prowjobs branch from e924d72 to f8c2e64 Compare October 23, 2019 20:44

fejta approved these changes Oct 23, 2019

View reviewed changes

k8s-ci-robot assigned fejta Oct 23, 2019

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 23, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2019

Tide: Use an index on the lister to filter prowjobs for subpool

67f02d0

alvaroaleman force-pushed the tide-index-prowjobs branch from f8c2e64 to 67f02d0 Compare October 24, 2019 11:44

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2019

fejta approved these changes Oct 25, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 25, 2019

cjwagner approved these changes Oct 25, 2019

View reviewed changes

k8s-ci-robot assigned cjwagner Oct 25, 2019

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2019

k8s-ci-robot merged commit 8840058 into kubernetes:master Oct 25, 2019

k8s-ci-robot added this to the v1.17 milestone Oct 25, 2019

alvaroaleman mentioned this pull request Oct 28, 2019

tide: excessive memory usage #14798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tide: Use an index on the lister to filter prowjobs for subpool #14830

Tide: Use an index on the lister to filter prowjobs for subpool #14830

alvaroaleman commented Oct 17, 2019

alvaroaleman commented Oct 17, 2019 •

edited

Loading

stevekuznetsov left a comment

alvaroaleman commented Oct 17, 2019

alvaroaleman commented Oct 17, 2019

fejta commented Oct 17, 2019

cjwagner left a comment

cjwagner Oct 17, 2019

cjwagner Oct 17, 2019

fejta Oct 21, 2019

cjwagner Oct 21, 2019

fejta Oct 22, 2019 •

edited

Loading

alvaroaleman Oct 23, 2019

fejta Oct 23, 2019 •

edited

Loading

alvaroaleman Oct 23, 2019

fejta Oct 23, 2019 •

edited

Loading

alvaroaleman Oct 23, 2019

alvaroaleman commented Oct 21, 2019

cjwagner left a comment

cjwagner Oct 21, 2019

alvaroaleman Oct 22, 2019

alvaroaleman commented Oct 21, 2019

fejta left a comment

fejta Oct 21, 2019

fejta left a comment

fejta Oct 23, 2019

alvaroaleman Oct 23, 2019

cjwagner left a comment

k8s-ci-robot commented Oct 25, 2019

cjwagner commented Oct 25, 2019

Tide: Use an index on the lister to filter prowjobs for subpool #14830

Tide: Use an index on the lister to filter prowjobs for subpool #14830

Conversation

alvaroaleman commented Oct 17, 2019

alvaroaleman commented Oct 17, 2019 • edited Loading

stevekuznetsov left a comment

Choose a reason for hiding this comment

alvaroaleman commented Oct 17, 2019

alvaroaleman commented Oct 17, 2019

fejta commented Oct 17, 2019

cjwagner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvaroaleman commented Oct 21, 2019

cjwagner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvaroaleman commented Oct 21, 2019

fejta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjwagner left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 25, 2019

cjwagner commented Oct 25, 2019

alvaroaleman commented Oct 17, 2019 •

edited

Loading

fejta Oct 22, 2019 •

edited

Loading

fejta Oct 23, 2019 •

edited

Loading

fejta Oct 23, 2019 •

edited

Loading