-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: TPU Provisioner: Node pool hash comparison #967
base: main
Are you sure you want to change the base?
WIP: TPU Provisioner: Node pool hash comparison #967
Conversation
func nodePoolSelectiveHash(np *containerv1beta1.NodePool) (string, error) { | ||
h := fnv.New32a() | ||
npToHash := &containerv1beta1.NodePool{ | ||
Config: &containerv1beta1.NodeConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't include ReservationAffinity: reservation, that means there will be no case of flipping between reservation and on-demand (at least in the near term), which are treated as same tier today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
06681b9
to
ddff710
Compare
ddff710
to
64973ac
Compare
Updated to include an interface for interacting with GKE NodePool API in order to add some tests. |
Issue:
Today the provisioner just checks if a node pool with the expected name exists for a given workload. If the workload is quickly recreated with a different node selector it is possible that the provisioner will not delete the old node pool in time and it will not think it needs to create a new one since the old one exists.
Fix:
This PR introduces new logic into the provisioner to check that the existing node pool matches the desired node pool via a hash comparison. Hash is calculated at node pool creation time and stored as a node pool label for later comparison.