Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add source metadata RFC #15
base: master
Are you sure you want to change the base?
Add source metadata RFC #15
Changes from 3 commits
0561c40
bef860d
beac023
c918aa9
d15afc0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I understand this limitation. But can we be clearer about the problem here? What specific features / use cases are we not able to satisfy without these metrics?
Especially because this proposal only includes this metadata on meshed connections (so we're guaranteed to have the client metrics).
What do these labels enable, concretely, that we can't do otherwise? Or if we do not add them, what risks do we incur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next paragraph talks about the limitations we face from not having these metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So? Is the problem "this is awkward?" or are there fundamental limitations that we incur here?
I want to be careful about looking at this benefit versus the cost of the implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some explicit examples of questions that we can't answer without source metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put a finer point on this... there's a specific API call for which we return incorrect data, right? What is that API Call?
I think this may be the only problem we're trying to address with this proposal. I'd try to take out much of the other context about labels and asymmetry and focus very concretely on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's quite right. I don't think any of our API calls return incorrect data, their behavior is just confusing and non-uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it sounds like its our metrics API that is the problem: that we have no way to distinguish these concepts at all.
To put it another way: if we made the changes proposed here, what impact would that have on our metrics API? Would we just subtly change the semantics of an existing call? Would we be unable to represent this call?
I view Linkerd's prometheus data as diagnostics... like log lines, basically. The format & details may change between releases as long as we satisfy our metrics API, which powers consumers within Linkerd. So, a problem statement phrased in terms of our prometheus data isn't all that compelling.
However, it absolutely does sound like a problem if there are fundamental questions that our API cannot answer about mesh traffic. How do we improve Linkerd's metrics API so that it can properly differentiate client and serverside metrics? If this requires source labels, great, I'm all in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked the RFC to be API focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the updated framing.
If it's alright with you, I'm going to suggest some further edits via PR to contextualize how I see this all fitting together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a bit squishy... It's difficult, but we do it. I.e. for consumers of our API, there's no overhead. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not present traffic metrics in a consistent way today. We're forced to either hide the distinction between server-side and client-side metrics, which is deceitful, or be explicit that certain queries are measured on one side while other queries are measured on the other, which is confusing. Today we split the difference by being vague. So I see this a problem for consumers of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this RFC actually addresses this issue... We already have client IDs on server metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, though we don't use that data for this purpose today. We could use it, but it would be more straightforward to use source metadata. While this would not impact consumers of the API, I still think that having a cleaner implementation is of some value.
However, I have removed this point for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-mixer istio actually does this differently still. Have you taken a peek at how they're doing it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't. I'll check it out.