Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion volume resumePolicy override #1992

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

a9p
Copy link
Contributor

@a9p a9p commented Nov 1, 2022

What this PR does / why we need it:

PBT currently relies on resumePolicy to be FromVolume to run properly. The proposed changes follow up on previous discussion on removing the requirement for users specifying this field and instead handling the logic automatically for suggestions that require it.

Which issue(s) this PR fixes:
Fixes #1893

Checklist:

  • Docs included if any changes are user facing

@a9p
Copy link
Contributor Author

a9p commented Nov 1, 2022

cc/ @andreyvelich was this closer to what you were looking for with your original comment?

@coveralls
Copy link

coveralls commented Nov 1, 2022

Coverage Status

Coverage decreased (-0.1%) to 73.398% when pulling 3f1eba0 on a9p:reconcile into 6b55540 on kubeflow:master.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a9p Thanks for driving this!
I left a few comments.

pkg/controller.v1beta1/suggestion/suggestion_controller.go Outdated Show resolved Hide resolved
@tenzen-y
Copy link
Member

tenzen-y commented Nov 3, 2022

@a9p please get in touch with me when this PR is ready for review.

@a9p a9p marked this pull request as ready for review November 4, 2022 08:16
@a9p
Copy link
Contributor Author

a9p commented Nov 4, 2022

@tenzen-y thanks for the suggestion! I think this PR is ready for your review now -- let me know if you have any other comments!

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a9p Thanks for updating this PR! This feature looks good to me.

Comment on lines 201 to 202
// If ResumePolicy = FromVolume (or overriden), volume is reconciled for suggestion
if suggestionConfigData.VolumeForceMount || instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the test for this logic?

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @a9p.
I left few suggestions.

@@ -42,6 +42,7 @@ type SuggestionConfig struct {
PersistentVolumeClaimSpec corev1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"`
PersistentVolumeSpec corev1.PersistentVolumeSpec `json:"persistentVolumeSpec,omitempty"`
PersistentVolumeLabels map[string]string `json:"persistentVolumeLabels,omitempty"`
VolumeForceMount bool `json:"volumeForceMount,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid additional VolumeForceMount parameter in config ?

One suggestion that I have: Always get desired volume if Suggestion Config Data contains Persistent Volume Spec or if Resume Policy = From Volume.
The default values for PVC, we can set under the Desired Volume function.

I understand that it doesn't solve the problem when user wants to use the default values for Suggestion volume and still attach the volume, but we can find the better solution.

Any other ideas how to avoid more parameters in Katib Config @a9p @tenzen-y @johnugeorge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always get desired volume if Suggestion Config Data contains Persistent Volume Spec or if Resume Policy = From Volume.
The default values for PVC, we can set under the Desired Volume function.

Maybe, it's idea is better to avoid more parameters in katib-config.

Any other ideas how to avoid more parameters in Katib Config

I have no good ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with removing the new parameter 👍🏽

I'm not sure what a good default is for the mounts or if there is one -- it seems like that would be setup dependent. One thought would be to prevent the suggestion service from starting if the mount doesn't exist in the config (making the config issue apparent up front) for services that require one? Currently, PBT for example, basically does a random search at the initial seed until run limits are hit which is a bit opaque unless you read the documentation for the config needed to operate correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought would be to prevent the suggestion service from starting if the mount doesn't exist in the config (making the config issue apparent up front) for services that require one?

@a9p How we are going to understand for which Suggestion service the Volume is mandatory ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking something like this, but hardcoding this in the suggestion controller isn't ideal. Possibly linking the required resources to the suggestion service and reading that in at the controller (e.g. an api change to the rpc endpoint implemented by the suggestion defaulting to no additional requirements)?

Copy link
Contributor Author

@a9p a9p Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andreyvelich / @tenzen-y ! Curious if either of you have any thoughts on the above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a9p Apologies for the late reply, please can we discuss it again after the Kubeflow release (e.g. after 2 weeks) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich / @tenzen-y -- just following up if either of you have any thoughts on the best way to include the k8s spec in this repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. I'm missing the notifications. I came back to this thread right now.
I'll check your changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is git submoduling in the manager/v1beta1 directory the preferred approach? I put a temporary shim in the build.sh tying it to the minor k8s version just to verify static checks, but the code should still fail at runtime since it's deleting the directory.

@a9p Thanks for your investigating.
Can we store the proto definition for the k8s.io/apps/v1 only when generating codes? I guess we can store the k8s proto file locally like the following.

https://github.com/kubeflow/mpi-operator/blob/c034b8caf613b2951aa2550c32f1333c58f56a39/Makefile#L170-L172

I guess that we can do that with GOPATH=/tmp go install k8s.io/api.
I appreciate your effort!

}

// If ResumePolicy = FromVolume (or overriden), volume is reconciled for suggestion
if suggestionConfigData.VolumeForceMount || instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to identify more places where we check if Suggestion has volume and we need to do something.
e.g. Controller deletes volume when Experiment is finished:

} else if instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume {

Maybe we can update the Suggestion CR's API with volumeAttached: bool in addition to resumePolicy.
So we can do appropriate changes based on this spec setting.
We are creating Suggestion CR before reconcile Suggestion, so you don't need to get Katib Config before reconcile Suggestion volume.

What do you think about this API change @tenzen-y @johnugeorge @a9p ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions under which katib-controller sets volumeAttached to true? Do you plan to do only when Spec.ResumePolicy == experimentsv1beta1.FromVolume? Or also check whether status.phase == Bound and so on in PVC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more case where it would be true is where the service needs a volume independent of ResumePolicy (this is what I think the original issue was pointing at separating)

@a9p
Copy link
Contributor Author

a9p commented Jun 4, 2023

/hold

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @a9p, do you have any free time to complete this PR so we can merge it as part of our next release (Katib release-0.16)?

@a9p
Copy link
Contributor Author

a9p commented Jul 25, 2023

Hi @andreyvelich, it doesn't look like I will be able to wrap up this PR before feature freeze. Could we push this to the next release target? I should have a fair amount more time to work on this closer to October.

@andreyvelich
Copy link
Member

Sure, no problem @a9p!
Thank you for your time

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

@a9p Hi, are you still here?

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

/remove-lifecycle stale

Still, we should move this forward.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Copy link

This pull request has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@github-actions github-actions bot closed this May 16, 2024
@andreyvelich andreyvelich reopened this May 16, 2024
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

/remove-lifecycle stale
/help
/good-first-issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate suggestion volume reconcile from resumePolicy
4 participants