-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion volume resumePolicy override #1992
base: master
Are you sure you want to change the base?
Conversation
cc/ @andreyvelich was this closer to what you were looking for with your original comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a9p Thanks for driving this!
I left a few comments.
@a9p please get in touch with me when this PR is ready for review. |
@tenzen-y thanks for the suggestion! I think this PR is ready for your review now -- let me know if you have any other comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a9p Thanks for updating this PR! This feature looks good to me.
// If ResumePolicy = FromVolume (or overriden), volume is reconciled for suggestion | ||
if suggestionConfigData.VolumeForceMount || instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the test for this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @a9p.
I left few suggestions.
@@ -42,6 +42,7 @@ type SuggestionConfig struct { | |||
PersistentVolumeClaimSpec corev1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"` | |||
PersistentVolumeSpec corev1.PersistentVolumeSpec `json:"persistentVolumeSpec,omitempty"` | |||
PersistentVolumeLabels map[string]string `json:"persistentVolumeLabels,omitempty"` | |||
VolumeForceMount bool `json:"volumeForceMount,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid additional VolumeForceMount
parameter in config ?
One suggestion that I have: Always get desired volume if Suggestion Config Data contains Persistent Volume Spec or if Resume Policy = From Volume.
The default values for PVC, we can set under the Desired Volume function.
I understand that it doesn't solve the problem when user wants to use the default values for Suggestion volume and still attach the volume, but we can find the better solution.
Any other ideas how to avoid more parameters in Katib Config @a9p @tenzen-y @johnugeorge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always get desired volume if Suggestion Config Data contains Persistent Volume Spec or if Resume Policy = From Volume.
The default values for PVC, we can set under the Desired Volume function.
Maybe, it's idea is better to avoid more parameters in katib-config.
Any other ideas how to avoid more parameters in Katib Config
I have no good ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing the new parameter 👍🏽
I'm not sure what a good default is for the mounts or if there is one -- it seems like that would be setup dependent. One thought would be to prevent the suggestion service from starting if the mount doesn't exist in the config (making the config issue apparent up front) for services that require one? Currently, PBT for example, basically does a random search at the initial seed until run limits are hit which is a bit opaque unless you read the documentation for the config needed to operate correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought would be to prevent the suggestion service from starting if the mount doesn't exist in the config (making the config issue apparent up front) for services that require one?
@a9p How we are going to understand for which Suggestion service the Volume is mandatory ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking something like this, but hardcoding this in the suggestion controller isn't ideal. Possibly linking the required resources to the suggestion service and reading that in at the controller (e.g. an api change to the rpc endpoint implemented by the suggestion defaulting to no additional requirements)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andreyvelich / @tenzen-y ! Curious if either of you have any thoughts on the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a9p Apologies for the late reply, please can we discuss it again after the Kubeflow release (e.g. after 2 weeks) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich / @tenzen-y -- just following up if either of you have any thoughts on the best way to include the k8s spec in this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. I'm missing the notifications. I came back to this thread right now.
I'll check your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is git submoduling in the manager/v1beta1 directory the preferred approach? I put a temporary shim in the build.sh tying it to the minor k8s version just to verify static checks, but the code should still fail at runtime since it's deleting the directory.
@a9p Thanks for your investigating.
Can we store the proto definition for the k8s.io/apps/v1 only when generating codes? I guess we can store the k8s proto file locally like the following.
I guess that we can do that with GOPATH=/tmp go install k8s.io/api
.
I appreciate your effort!
} | ||
|
||
// If ResumePolicy = FromVolume (or overriden), volume is reconciled for suggestion | ||
if suggestionConfigData.VolumeForceMount || instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to identify more places where we check if Suggestion has volume and we need to do something.
e.g. Controller deletes volume when Experiment is finished:
} else if instance.Spec.ResumePolicy == experimentsv1beta1.FromVolume { |
Maybe we can update the Suggestion CR's API with volumeAttached: bool
in addition to resumePolicy
.
So we can do appropriate changes based on this spec setting.
We are creating Suggestion CR before reconcile Suggestion, so you don't need to get Katib Config before reconcile Suggestion volume.
What do you think about this API change @tenzen-y @johnugeorge @a9p ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the conditions under which katib-controller sets volumeAttached
to true? Do you plan to do only when Spec.ResumePolicy == experimentsv1beta1.FromVolume
? Or also check whether status.phase == Bound
and so on in PVC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more case where it would be true is where the service needs a volume independent of ResumePolicy (this is what I think the original issue was pointing at separating)
…po to enable protoc compilation. Go outputs look to correctly use upstream package.
…thmSpec.suggestion_spec (DeploymentSpec) or resumePolicy is FromVolume
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @a9p, do you have any free time to complete this PR so we can merge it as part of our next release (Katib release-0.16)?
Hi @andreyvelich, it doesn't look like I will be able to wrap up this PR before feature freeze. Could we push this to the next release target? I should have a fair amount more time to work on this closer to October. |
Sure, no problem @a9p! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@a9p Hi, are you still here? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale Still, we should move this forward. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
What this PR does / why we need it:
PBT currently relies on
resumePolicy
to beFromVolume
to run properly. The proposed changes follow up on previous discussion on removing the requirement for users specifying this field and instead handling the logic automatically for suggestions that require it.Which issue(s) this PR fixes:
Fixes #1893
Checklist: