Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow duplicate weighted backend keys #3319

Merged
merged 2 commits into from
Nov 7, 2024
Merged

Allow duplicate weighted backend keys #3319

merged 2 commits into from
Nov 7, 2024

Conversation

sfleen
Copy link
Collaborator

@sfleen sfleen commented Nov 4, 2024

Currently, if the proxy receives two backends with the same metadata, one of the backends will get dropped because the backend metadata is used as the key in a hash map. Attempting to then randomly distribute requests to the backends can panic when selecting the now non-existent backend.

This is fixed by no longer using backend metadata as a hash map key, instead generating separate IDs that are stored in a vec to retain the declared order of backends while also being used to look up the backend and associated weight independetly.

Validated with new unit tests excercising duplicate backend keys, as well as a few around the invariants use to store the backends.

@sfleen sfleen force-pushed the disambig branch 2 times, most recently from 501ca8b to c58b4af Compare November 4, 2024 15:46

#[derive(Debug)]
pub struct UnweightedKeys<K> {
ids: Vec<KeyId>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key part of this change. It decouples the backend keys from the iteration order of the backends with a separate key type. I considered using something like SlotMap, but since we don't modify or remove anything from these maps after they're created we don't need the guarantee of stable keys that it provides and the slight performance cost that comes with it.


impl<K> UnweightedKeys<K> {
pub(crate) fn new(iter: impl Iterator<Item = K>) -> Self {
let mut ids = Vec::new();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the invariant of the IDs and keys being a 1:1 relation is guaranteed by only allowing construction via enumeration of the iterator over keys.

}

pub(crate) fn keys(&self) -> &[K] {
pub(crate) fn make_svc_from_keys<T>(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was another key change. Instead of exposing a slice of keys to create the backend services, this inverts the control to something similar to a combinator to decouple how they keys are iterated over and mapped into services.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b64583e

i have a commit here that i'd like to offer for consideration: we only use this to construct our Distribute<K>. we can cut down on non-local reasoning & keep our parameter object a little slimmer by defining this as a private associated function over yonder!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.map(|k| (k.clone(), newk.new_service(k.clone())))
.collect();
Distribute::new(backends, dist)
Distribute::new(dist, |k| newk.new_service(k.clone()))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now this is much simpler and is decoupled from how the service keys are stored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the changes in this file! it's nice to not have to do this iter()...collect() dance here at this call-site.

@sfleen sfleen marked this pull request as ready for review November 4, 2024 15:56
@sfleen sfleen requested a review from a team as a code owner November 4, 2024 15:56
@cratelyn cratelyn self-requested a review November 4, 2024 16:06
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 81.37931% with 27 lines in your changes missing coverage. Please review.

Project coverage is 67.57%. Comparing base (96124bc) to head (987aa8b).
Report is 546 commits behind head on main.

Files with missing lines Patch % Lines
linkerd/distribute/src/service/random.rs 71.05% 11 Missing ⚠️
linkerd/distribute/src/keys.rs 80.43% 9 Missing ⚠️
linkerd/distribute/src/service.rs 79.16% 5 Missing ⚠️
linkerd/distribute/src/service/first.rs 90.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3319      +/-   ##
==========================================
- Coverage   67.68%   67.57%   -0.11%     
==========================================
  Files         332      380      +48     
  Lines       15158    17599    +2441     
==========================================
+ Hits        10259    11892    +1633     
- Misses       4899     5707     +808     
Files with missing lines Coverage Δ
linkerd/distribute/src/params.rs 71.42% <100.00%> (-8.58%) ⬇️
linkerd/distribute/src/service/empty.rs 100.00% <100.00%> (ø)
linkerd/distribute/src/stack.rs 83.33% <100.00%> (-2.39%) ⬇️
linkerd/distribute/src/service/first.rs 90.00% <90.00%> (ø)
linkerd/distribute/src/service.rs 78.57% <79.16%> (+11.90%) ⬆️
linkerd/distribute/src/keys.rs 80.43% <80.43%> (ø)
linkerd/distribute/src/service/random.rs 71.05% <71.05%> (ø)

... and 78 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24d0092...987aa8b. Read the comment docs.

Copy link
Collaborator

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a group of comments, most of which are nits, and one possible bug?

zooming out, i'm curious to tug at whether we can dodge the need to maintain a separate "keyspace" in order to account for duplicate keys.

i'll offer some more thoughts on that in a follow-up comment, but wanted to offer these comments in the meantime! (update 11/05: we paired on this and addressed these thoughts)

linkerd/distribute/src/service.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/keys.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service.rs Outdated Show resolved Hide resolved
.map(|k| (k.clone(), newk.new_service(k.clone())))
.collect();
Distribute::new(backends, dist)
Distribute::new(dist, |k| newk.new_service(k.clone()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the changes in this file! it's nice to not have to do this iter()...collect() dance here at this call-site.

linkerd/distribute/src/keys.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/keys.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/keys.rs Show resolved Hide resolved
linkerd/distribute/src/service.rs Show resolved Hide resolved
linkerd/distribute/src/stack.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service.rs Outdated Show resolved Hide resolved
@sfleen
Copy link
Collaborator Author

sfleen commented Nov 7, 2024

I've added an extra refactoring commit that creates separate structs for each distribution variant, which I think makes it a lot clearer and easier to understand. It's a bit more involved, so I'm more than happy to make it a follow-up PR instead if it's too complex for this one.

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good!

  • We should pub(crate) structs that are not part of the public interface.
  • We should use AHashMap to be consistent with the backend cache in this crate.
  • tioli: The EmptySelection is trivial and can be eliminated.

linkerd/distribute/src/keys.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/keys.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service/empty.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service/empty.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service/first.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/service/random.rs Outdated Show resolved Hide resolved
linkerd/distribute/src/keys.rs Show resolved Hide resolved
linkerd/distribute/src/service/random.rs Show resolved Hide resolved
Currently, if the proxy receives two backends with the same metadata,
one of the backends will get dropped because the backend metadata is
used as the key in a hash map. Attempting to then randomly distribute
requests to the backends can panic when selecting the now non-existent
backend.

This is fixed by no longer using backend metadata as a hash map key,
instead generating separate IDs that are stored in a vec to retain the
declared order of backends while also being used to look up the backend
and associated weight independetly.

Validated with new unit tests excercising duplicate backend keys, as well
as a few around the invariants use to store the backends.

Signed-off-by: Scott Fleener <[email protected]>
The existing distribution didn't have a great separation between the different kinds, like having some fields that only apply to one of the variants and not the others.

This splits out the distributions into separate structs to make the logic more clear. The structs have been moved to separate files, which makes it much easier to grok the logic for a specific distribution variant.

Signed-off-by: Scott Fleener <[email protected]>
@sfleen sfleen enabled auto-merge (squash) November 7, 2024 15:31
@sfleen sfleen merged commit b34d32b into linkerd:main Nov 7, 2024
15 checks passed
@sfleen sfleen deleted the disambig branch November 7, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants