-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement webhook validation for the TFJob #2051
Implement webhook validation for the TFJob #2051
Conversation
Signed-off-by: Yuki Iwai <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/assign @andreyvelich @johnugeorge This is the next PR to introduce webhook validations. PTAL, thanks. |
Some tests are timing out |
I guess that some tests reach GHA limitations... |
This error seems to be related it: docker/buildx#841 |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @tenzen-y!
A few comments.
func IsChiefOrMaster(typ ReplicaType) bool { | ||
return typ == TFJobReplicaTypeChief || typ == TFJobReplicaTypeMaster | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y Does it make sense to move this function in webhooks/tensorflow/tfjob_webhook.go
since we use it there and it will help us to avoid creation of additional file named tensorflow_types_test.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I'm wondering if we can put this function in the webhook package.
However, I decided to put this here since this function is not only for the webhooks like this: https://github.com/search?q=repo%3Akubeflow%2Ftraining-operator%20IsChiefOrMaster&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y Do we really need this check in the controller if we already verify if TFJob contains Chief or Master here: https://github.com/kubeflow/training-operator/blob/c20422067e3ef81df39d03c6f285353344d8f77d/pkg/controller.v1/tensorflow/tfjob_controller.go#L452C6-L452C31 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need it since the webhook verifies only if the master/chief role replica is 0 or 1.
When TFJob doesn't have any master/chief role, the training operator switches there based on the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, since we need additional check to verify if rtype
is Chief of Master
if errors := apimachineryvalidation.NameIsDNS1035Label(job.Name, false); len(errors) != 0 { | ||
allErrs = append(allErrs, field.Invalid(field.NewPath("metadata").Child("name"), job.Name, fmt.Sprintf("should match: %v", strings.Join(errors, ",")))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this validation to the helper function in /webhooks/utils/utils.go
since this check is common to all Framework ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can commonize these functions, but as I mentioned in the description, commonizing in a separate PR would be better. Because the minimum validation change could prove that webhook validations would not bring any bugs/breaks.
Although we potentially commonize podTemplate validations, I leave such commonize in the follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that sounds good @tenzen-y!
We might need to track it in the separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created: #2054
for idx, container := range rSpec.Template.Spec.Containers { | ||
if container.Image == "" { | ||
allErrs = append(allErrs, field.Required(containerPath.Index(idx).Child("image"), "must be required")) | ||
} | ||
if container.Name == trainingoperator.TFJobDefaultContainerName { | ||
defaultContainerPresent = true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above comment maybe that could be moved to the helper function in which we pass container name for the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tenzen-y!
/lgtm
func IsChiefOrMaster(typ ReplicaType) bool { | ||
return typ == TFJobReplicaTypeChief || typ == TFJobReplicaTypeMaster | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, since we need additional check to verify if rtype
is Chief of Master
Thanks all! /hold cancel |
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
What this PR does / why we need it:
I implemented webhook validations for the TFJob.
Additionally, I didn't add any additional validations. The traininig-operator has the same validations the same as before.
Although we potentially commonize podTemplate validations, I leave such commonize in the follow-ups.
This PR depends on #2035
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Part-of #1993
Checklist: