Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tide: Use an index on the lister to filter prowjobs for subpool #14830

Merged
merged 1 commit into from
Oct 25, 2019

Conversation

alvaroaleman
Copy link
Member

Because of #14798 this PR aims at reducing the CPU and memory consumption of tide by indexing Presubmit/Batch prowjobs by baseSHA and then doing a list per subpool rather than a list of everything.

/assign @stevekuznetsov

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2019
@k8s-ci-robot k8s-ci-robot added area/prow Issues or PRs related to prow area/prow/tide Issues or PRs related to prow's tide component sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 17, 2019
@alvaroaleman
Copy link
Member Author

alvaroaleman commented Oct 17, 2019

The main drawback of this PR is that we do not have unittests for "do we actually add the index to the cache" anymore, so if someone removed that, tests wont notice but Tide will not work anymore.

The only way I see to test that is by using something like https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest for spinning up an actual apiserver during tests.

Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected to see the fake client use the index as well in tests?

@alvaroaleman
Copy link
Member Author

I would have expected to see the fake client use the index as well in tests?

The index is not in the client, its in the cache. The production client uses the cache for all read operations, thats why it can be used there.

@alvaroaleman
Copy link
Member Author

If we want a test where the index is being used, there is no way around spinning up an actual apiserver. Which is reasonable easy via the envtest package.

@fejta
Copy link
Contributor

fejta commented Oct 17, 2019

/uncc
/cc @cjwagner

@k8s-ci-robot k8s-ci-robot requested review from cjwagner and removed request for fejta October 17, 2019 20:41
Copy link
Member

@cjwagner cjwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a test where the index is being used, there is no way around spinning up an actual apiserver. Which is reasonable easy via the envtest package.

Since this is critical and might be adjusted later this is definitely something we should be testing. Can/should we use envtest from golang unit tests or do we need to use a separate integration test job.

}
sps[fn].pjs = append(sps[fn].pjs, pj)
c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))
sps[subpoolkey].pjs = pjs.Items
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the base SHA matches is not enough to know that this PJ matches the tide pool. It is possible for the same base sha to exist in different org/repo. More realistically, it is also possible for two different branches (BaseRef) to point to the same commit and we need to distinguish the different pools in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily need to make the index more complicated, but if we keep it as is we need to filter out PJs for other subpools like we did here before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit tests for these scenarios? That's the best way to ensure it remains invariant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alvaro added unit tests, but they require the envtest package I asked about here: #14830 (review)

(Also if this is what we want to do, do you have any recommendations on the best way to do this with bazel?)

Copy link
Contributor

@fejta fejta Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add actual unit tests? I don't see any reason why we would need etcd in order to validate the behavior of

  • the same SHA existing in multiple repos
  • two branches pointing to the same commit

AKA ensuring

  • commit C repo R, branch B => index i1
  • commit C, repo S, branch B => index i2
  • commit C, repo R, branch D => index i3
    (or at least that none of these equals the other) seems sufficient here

This seems more useful/important than validating the internal behavior of c.prowJobClient.List() (that would be more an integration test than unit test)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate that "input X return exactly Y". This is different than "ensure input X and X' return whatever values so long as they differ from each other", right?

Added a TestCacheIndexFuncReturnsDifferentResultsForDifferentInputs that verifies that.

Generally, using a cache with an index is probably the most efficient solution for filtering kube api objects repeatedly, so it would be great if we can find a way to test that that is agreeable for everyone :) How do you feel about using envtest for that @fejta ?

Copy link
Contributor

@fejta fejta Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faking the interface seems like the most appropriate strategy here.

Making everything depend on making real calls to the apiserver is an inefficient test strategy. We're just consumers of all this code. Why can't we just assume kubernetes works correctly?

We do not, for example, validate the the Google cloud storage client transfers files correctly. We just assume it works and fake the return values. This is efficient and hasn't lead to any regressions.

I would recommend this faking strategy here over replacing unit testing with integration testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I've also opened kubernetes-sigs/controller-runtime#657 for this, because ideally we don't have to build fakes downstream.

While a test for "do we actually add the index to the cache" definitely makes sense, adding a fake client that uses an index pretty much only tests our dependencies (that do have tests for this) and our fake implementation. For the sake of getting this done, I'll add it nonetheless.

Copy link
Contributor

@fejta fejta Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that is probably more work than is justified? This is the call to test:

		err := c.prowJobClient.List(
			c.ctx,
			pjs,
			ctrlruntimeclient.MatchingField(cacheIndexName, cacheIndexKey(sp.org, sp.repo, sp.branch, sp.sha)),
			ctrlruntimeclient.InNamespace(c.config().ProwJobNamespace))

So wrap it in a function:

type lister interface {
  List(context.Context, *prowapi.ProwJobList, matchingFieldRetValue, namespaceRetValue) error
}

func listMatchingJobs(ctx context.Context, prowJobClient lister, sp subpool, namespace string) (*prowapi.ProwJobList, error) {
  var pjs prowapi.ProwJobList
  if err := prowJobClient.List(ctx, &pjs, MatchingField(cacheIndexName, cacheIndexKey(sp.org, sp.repo, sp.branch, sp.sha)), InNamespace(namespace); err != nil {
    return nil, err
  }
  return &pjs

Now write a fakeLister and validate that we send cacheIndexName, cacheIndexKey and namespace correctly.

Change all the list calls to this helper function.

Done.

I don't think creating a fake client that works correctly is super necessary here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I ended up writing a small implementation of the subset of the manager we need and something that just embedds a an upstream fakeclient and wraps its List call to use an index func if requested.

I think this should be good now, PTAL.

@alvaroaleman alvaroaleman force-pushed the tide-index-prowjobs branch 2 times, most recently from b2de716 to 5485dd0 Compare October 20, 2019 18:02
@alvaroaleman
Copy link
Member Author

Updated the PR to also considers org, repo and branch in the repo and use envtest for testing, which allowed me to restore TestDividePool.

It won't pass in CI thought until there is an etcd and kube-apiserver binary available. How can I build an image for CI?

Can/should we use envtest from golang unit tests or do we need to use a separate integration test job

I don't care much either way, we only have to keep in mind that envtest or more specifically spinning up etcd and kube-apiserver is pretty sensitive to CPU starvation.

Copy link
Member

@cjwagner cjwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't pass in CI thought until there is an etcd and kube-apiserver binary available. How can I build an image for CI?

I think we'd want to rely on bazel instead of baking dependencies into a test image. Do you know how to make that work?
@fejta @stevekuznetsov WDYT about using envtest to spin up etcd and kube-apiserver in our unit tests? Is this reasonable or should we use a separate ProwJob (or use a different testing pattern altogether)?

}
sps[fn].pjs = append(sps[fn].pjs, pj)
c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Debugf. Also might want to mention that this is the number of prowjobs found for the subpool before filtering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which filtering are you referring to by "before filtering"? They are already filtered by org/repo/branch+baseSHA

@alvaroaleman
Copy link
Member Author

think we'd want to rely on bazel instead of baking dependencies into a test image. Do you know how to make that work?

Unfortunately not, do you have any kind of reference/sample for "Use bazel to download and provide binaries"?

Copy link
Contributor

@fejta fejta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
sps[fn].pjs = append(sps[fn].pjs, pj)
c.logger.WithField("subpool", subpoolkey).Infof("Found %d prowjobs.", len(pjs.Items))
sps[subpoolkey].pjs = pjs.Items
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit tests for these scenarios? That's the best way to ensure it remains invariant.

Copy link
Contributor

@fejta fejta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

BaseRef: sp.branch,
BaseSHA: sp.sha,
}
if diff := deep.Equal(pj.Spec.Refs, referenceRef); diff != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reflect.DeepEqual not work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work, but deep provides a very easy to read diff which is super helpful when debugging issues with the tests

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 23, 2019
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2019
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 25, 2019
Copy link
Member

@cjwagner cjwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, cjwagner, fejta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cjwagner
Copy link
Member

🤦‍♂️ I meant to
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 8840058 into kubernetes:master Oct 25, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow/tide Issues or PRs related to prow's tide component area/prow Issues or PRs related to prow cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants