-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to achieve successful binding rate beyond ~300 pods/sec #71
Comments
Apologies for the temporary absence of a deployment guide for multi-instance setups, which may have caused confusion. We are working to improve this documentation as soon as possible. In the architecture of the Godel distributed scheduler, only one dispatcher and one binder instance are expected to be active at any given time. Multiple dispatcher/binder instances are deployed for high availability and must utilize leader election to prevent conflicts. For multiple scheduler instances, there are two possible scenarios:
If you have any further questions, please feel free to continue the discussion in the issue. |
Thanks for getting back on this. We tried with said configurations i.e.
Below are the run details: Nodes: 30k
$kubectl get leases -n godel-system $kubectl get schedulers -A $kubectl get schedulers godel-scheduler-phx5-3fq -o yaml The binding rate still hover around ~300 pods/sec. Also, FYI we have enabled Can you also share any documentations around various scheduling/queue metrics to see what is being bottleneck here? |
Not able to achieve successful binding rate beyond ~300 pods/sec. When running binder in active-active mode, getting high conflict rate while in active-passive mode overall binding rate is below ~300 pods/sec
Configurations:
Nodes: 30k
Pods: ~150k
Creation Rate: ~1.75k pods/sec via clusterloader
Case 1:
Case 2:
The text was updated successfully, but these errors were encountered: