Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Add the manifests overlay for Kubeflow Training V2 #2382

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
84 changes: 84 additions & 0 deletions manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-trainer-admin
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-trainer-admin: "true"
rules: []

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-trainer-edit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we also should have permission to read logs from TrainJob's pods.

labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-trainer-admin: "true"
rules:
- apiGroups:
- kubeflow.org
resources:
- clustertrainingruntimes
- trainingruntimes
- trainjobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- kubeflow.org
resources:
- trainjobs/status
verbs:
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required for now, please remove it.

verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- get
- list
- watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeflow-trainer-view
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-view: "true"
rules:
- apiGroups:
- kubeflow.org
resources:
- clustertrainingruntimes
- trainingruntimes
- trainjobs
verbs:
- get
- list
- watch
- apiGroups:
- kubeflow.org
resources:
- trainjobs/status
verbs:
- get
23 changes: 23 additions & 0 deletions manifests/overlays/kubeflow-platform/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kubeflow
resources:
- ../../base/crds
- ../../base/manager
- ../../base/rbac
- ../../base/webhook
- ../../base/runtimes/pretraining
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtimes must be installed after controller-manager is ready.
Are we ok with that ?

- ../../third-party/jobset # Comment this line if JobSet is installed on the Kubernetes cluster.
- kubeflow-trainer-roles.yaml

# Update the Kubeflow Trainer controller manager image tag.
images:
- name: kubeflow/trainer-controller-manager
newTag: latest

# Secret for the Kubeflow Training webhook.
secretGenerator:
- name: kubeflow-trainer-webhook-cert
namespace: kubeflow
options:
disableNameSuffixHash: true
Loading