Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add Helm chart for jobset #785

Merged
merged 6 commits into from
Feb 19, 2025

Conversation

ChenYi015
Copy link
Contributor

@ChenYi015 ChenYi015 commented Feb 17, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Close #726

Special notes for your reviewer:

As dicussed in kubeflow/trainer#2435 (comment), this PR is based on #760.
One can install the chart as follows:

helm install jobset charts/jobset \
    --namespace jobset-system \
    --create-namespace

For Helm chart development, one can generate the Helm chart README.md by make helm-docs and run the Helm unit tests by make helm-unittest.

Does this PR introduce a user-facing change?

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 17, 2025
Copy link

linux-foundation-easycla bot commented Feb 17, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @ChenYi015!

It looks like this is your first PR to kubernetes-sigs/jobset 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/jobset has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ChenYi015. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 17, 2025
Copy link

netlify bot commented Feb 17, 2025

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit c6920a5
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/67b5f17c4d25ff0008e6b2ad

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 17, 2025
@kannon92
Copy link
Contributor

/ok-to-test

minor edits needed but this is looking good. I will test it later today.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 17, 2025
@kannon92
Copy link
Contributor

/hold

Good news is that the helm chart installs without issue.

Bad news is that I can't actually run any JobSet with this helm chart.

I get panics for some of the examples and others complain about validation.

Can you confirm on your end that you can run some of the examples?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 17, 2025
Copy link
Contributor

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this @ChenYi015! I left my initial comments.

Should we also introduce this script to JobSet to sync manifests between
Helm <-> Kustomize ?

@ChenYi015
Copy link
Contributor Author

Bad news is that I can't actually run any JobSet with this helm chart.

I get panics for some of the examples and others complain about validation.

Can you confirm on your end that you can run some of the examples?

@kannon92 I have updated the PR and it works fine with example JobSet reconciled properly.

@ChenYi015
Copy link
Contributor Author

Should we also introduce this script to JobSet to sync manifests between
Helm <-> Kustomize ?

I will raise another issue to track it and implement it later.

@ChenYi015
Copy link
Contributor Author

ChenYi015 commented Feb 19, 2025

For the initial version of the jobset Helm chart, I think this PR is ready to be merged. The values is simplified to keep only a minimum set of params. In the future, we will extend the configurations if users request. For the syncing between Helm and Kustomize, the Helm chart related CI workflows and the CRDs upgrading approach, we can implement it in the future.
cc @kannon92 @ahg-g @andreyvelich

Copy link
Contributor

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort @ChenYi015!
I left a few small comments. We should remove values from the manifests that we don't use currently. For example:

{{ include "jobset.controller.serviceAccount.name" . }}

I think, after it we should be ready to merge it.

Comment on lines +45 to +47
leaderElection:
# -- Whether to enable leader election for jobset controller.
enable: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can we move the configurations that can be set with Config API to the managerConfig section in the Helm Chart values ? Similar to Kueue: https://github.com/kubernetes-sigs/kueue/blob/main/charts/kueue/values.yaml#L70-L72.

I believe, that makes it clearer for users that these parameters are part of manager config.
WDYT @ChenYi015 @tenzen-y @kannon92 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, I think we do not have to expose the Config API to end users, it will incur more complexity to configure the chart. Besides, for example, if one tries to configure the webhook service name and secret name by leveraging Config API, it will not work since the actual name of webhook service and secret is templated by Helm.

type InternalCertManagement struct {
// Enable controls whether to enable internal cert management or not.
// Defaults to true. If you want to use a third-party management, e.g. cert-manager,
// set it to false. See the user guide for more information.
Enable *bool `json:"enable,omitempty"`
// WebhookServiceName is the name of the Service used as part of the DNSName.
// Defaults to jobset-webhook-service.
WebhookServiceName *string `json:"webhookServiceName,omitempty"`
// WebhookSecretName is the name of the Secret used to store CA and server certs.
// Defaults to jobset-webhook-server-cert.
WebhookSecretName *string `json:"webhookSecretName,omitempty"`
}

Copy link
Contributor

@andreyvelich andreyvelich Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one tries to configure the webhook service name and secret name by leveraging Config API, it will not work since the actual name of webhook service and secret is templated by Helm.

I guess, in that case we should duplicate information for Config and for actual k8s resources, like here: https://github.com/kubernetes-sigs/kueue/blob/main/charts/kueue/values.yaml#L148-L157
I just feel that if we don't have Config YAML in the Chart values, it would be hard for user to understand that some Values are going to be inserted into Manager Config.

Maybe you know about better solution ?@astefanutti @tenzen-y @dongjiang1989

Comment on lines +98 to +106
certManager:
# -- Whether to use cert-manager to generate certificates for the jobset webhook.
enable: false

# -- The reference to the issuer.
# If empty, self-signed issuer will be created and used.
issuerRef: {}
# name: selfsigned
# kind: ClusterIssuer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cert-manager, these params are kept to allow users to enable/disable cert-manager and use a custom issuer.

Copy link
Contributor

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great contribution @ChenYi015!
I think, we can address the remaining comments in the followup PRs.
/lgtm
/assign @kannon92 @ahg-g @tenzen-y

@k8s-ci-robot
Copy link
Contributor

@andreyvelich: changing LGTM is restricted to collaborators

In response to this:

Thank you for this great contribution @ChenYi015!
/lgtm
/assign @kannon92 @ahg-g @tenzen-y

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Yi Chen <[email protected]>
@ahg-g
Copy link
Contributor

ahg-g commented Feb 19, 2025

Thanks a lot @ChenYi015 and for @andreyvelich for the thorough review and I hope we continue this collaboration and you folks become maintainers on this repo as well if you like :)

I will leave it to @kannon92 to approve since he was the primary maintainer looking at this PR.


prometheus:
# -- Whether to enable Prometheus metrics exporting.
enable: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you disable this by default?

Kustomize doesn't require this.

We can do this as a follow up.

@kannon92
Copy link
Contributor

/lgtm
/approve

I tested this and other than prometheus being required by default, it seemed to work fine!

We can address the prometheus as a follow up but I was able to test by disabling so I think this is good to go.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015, kannon92

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Feb 19, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9308c01 into kubernetes-sigs:main Feb 19, 2025
13 checks passed
@ChenYi015 ChenYi015 deleted the feature/helm-chart branch February 20, 2025 02:49
@kannon92
Copy link
Contributor

@ChenYi015 @andreyvelich
Are the follow-ups needed before we release this? I’d like to create a release soon.

@ChenYi015
Copy link
Contributor Author

I am going to raise another PR today to disable prometheus metrics exporting by default in Helm chart. Besides, I think we need to publish the Helm chart so users can install the chart like:

helm repo add jobset https://kuberentes-sigs.github.io/jobset
helm install jobset jobset/jobset \
      --namespace jobset-system \
      --create-namespace

@andreyvelich
Copy link
Contributor

@kannon92
Copy link
Contributor

cc @ahg-g

im happy to wait until we have a site for the helm chart but not sure what your customers want.

@kannon92
Copy link
Contributor

@tenzen-y do we publish the charts in Kueue anywhere?

@ChenYi015
Copy link
Contributor Author

How do we want to publish them ? Similar to KServe and Spark Operator ?

Sorry for the late response. I think there is another way to publish the charts, see #790 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helm chart for JobSet
7 participants