Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to achieve successful binding rate beyond ~300 pods/sec #71

Open
rishabh325 opened this issue Jan 7, 2025 · 2 comments
Open

Comments

@rishabh325
Copy link

Not able to achieve successful binding rate beyond ~300 pods/sec. When running binder in active-active mode, getting high conflict rate while in active-passive mode overall binding rate is below ~300 pods/sec

Configurations:
Nodes: 30k
Pods: ~150k
Creation Rate: ~1.75k pods/sec via clusterloader

Case 1:

Service Leader Elected Instances Resource Limit
Dispatcher No 4 2 instances w/ 32 cores/180Gi
Scheduler No 2 2 instances w/ 32 cores/250Gi
Binder No 4 2 instances w/ 32 cores/180Gi
Screenshot 2025-01-07 at 11 30 07 AM .

Case 2:

Service Leader Elected Instances Resource Limit
Dispatcher No 4 2 instances w/ 32 cores/180Gi
Scheduler No 2 2 instances w/ 32 cores/250Gi
Binder Yes 2 2 instances w/ 32 cores/180Gi
Screenshot 2025-01-07 at 11 32 39 AM
@binacs
Copy link
Member

binacs commented Feb 2, 2025

Apologies for the temporary absence of a deployment guide for multi-instance setups, which may have caused confusion. We are working to improve this documentation as soon as possible.

In the architecture of the Godel distributed scheduler, only one dispatcher and one binder instance are expected to be active at any given time. Multiple dispatcher/binder instances are deployed for high availability and must utilize leader election to prevent conflicts.

For multiple scheduler instances, there are two possible scenarios:

  1. If multiple instances belong to the same shard, they should share the same scheduler name and enable leader election.
  2. If instances belong to different shards, each shard's instances should be assigned a unique scheduler name.

If you have any further questions, please feel free to continue the discussion in the issue.

@rishabh325
Copy link
Author

rishabh325 commented Feb 10, 2025

Thanks for getting back on this.

We tried with said configurations i.e.

  1. Enabling leader election on dispatcher/binder
  2. Running multiple instances of scheduler with unique names

Below are the run details:

Nodes: 30k
Pods: ~150k
Creation Rate: ~1.75k pods/sec

Service Leader Elected Instances Resource Limit
Dispatcher Yes 2 2 instances w/ 32 cores/180Gi
Scheduler No 5 2 instances w/ 32 cores/250Gi
Binder Yes 2 2 instances w/ 32 cores/180Gi

$kubectl get leases -n godel-system
NAME HOLDER AGE
binder phx5-z93_0b1bcd50-7822-4497-8270-94bed85d89b6 63d
dispatcher phx5-2tv_0d96b8bb-5f19-4023-9dd7-14f79b59268e 63d

$kubectl get schedulers -A
NAME AGE
godel-scheduler-phx5-3fq 36m
godel-scheduler-phx5-4sc 4d23h
godel-scheduler-phx5-6kt 4d23h
godel-scheduler-phx5-6yq 4d23h
godel-scheduler-phx5-uhp 4d23h

$kubectl get schedulers godel-scheduler-phx5-3fq -o yaml
apiVersion: scheduling.godel.kubewharf.io/v1alpha1
kind: Scheduler
metadata:
creationTimestamp: "2025-02-10T17:35:06Z"
generation: 1
name: godel-scheduler-phx5-3fq
resourceVersion: "30439568215"
uid: b3c4b9d0-d608-4448-a9a7-7dd00c433934
spec: {}
status:
lastUpdateTime: "2025-02-10T18:12:07Z"

The binding rate still hover around ~300 pods/sec. Also, FYI we have enabled DispatcherNodeShuffle feature gate to try and see if enabling node sharding helps.

Can you also share any documentations around various scheduling/queue metrics to see what is being bottleneck here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants