Propose a stable metrics API #21

olix0r · 2020-04-24T16:01:21Z

This is a rough sketch outlining some of the properties and constraints
of a stable Linkerd metrics API.

@adleong @grampelberg PTAL. As I thought more about #15, I realized that the problem we're solving is that we don't have any contract around our metrics interface, at all. We need one. I think when we look at this larger scope of problem, the concrete need for source-metrics becomes clear in order to satisfy a variety of requests in an access-controlled manner. But this is just one piece of the puzzle.

I think this implies that we move the Web/CLI off of the variety of use-case-specific metrics endpoints onto a generalized interface (where it makes sense, anyway).

This is a rough sketch outlining some of the properties and constraints of a stable Linkerd metrics API.

adleong · 2020-04-24T17:50:58Z

problem/0000-metrics-v1.md

+implicit, undocumented relationship between the proxy's metrics exposition structure and traffic
+in Kubernetes.
+
+The goal of this RFC is to define an external-facing API for Linkerd that:


It would be helpful to be explicit about where our current metrics API falls short of these requirements.

The current API has well defined semantics (though I'm sure the documentation of those semantics could be improved)

As far as I know, we haven't published any guarantees or guidelines about the stability of our API, but we also have not yet made any backwards incompatible changes.

What does it mean to support external clients? Is there specific functionality that needs to be satisfied?

The current API does not check RBAC, though I believe the structure of the API is well suited to adding this.

Can you explain what this means?

Can you explain what this means?

The current APIs are undocumented. The ambiguities you discuss in the source-metrics RFC are exactly what I mean by the semantics not being well-defined. We have no way to reason about what exactly is being measured/reported but for reading/understanding the implementation. One potential outcome for this RFC is that we simply decide that we like the Stat API, document it, and provide examples of how to integrate with it. However, I don't think that belongs in the problem statement. I want to approach the metrics API with fresh eyes and to document exactly how it should work so that we have a plan for how it will grow. So that when we want to extend it, there's a clear story for doing so.

But the root problem we're solving is, as I mentioned, that it's far easier for users to just go to prometheus and get data, and we have absolute no documentation/guarantees/etc about this implicit API.

What does it mean to support external clients? Is there specific functionality that needs to be satisfied?

This is more of a product requirement than a technical one. I think gRPC may be a high bar for integrators (though it's certainly an option on the table). I also think we should be looking at our options in terms of interop. If, for instance, we can expose a version of prometheus's API, we could support many external clients with very little additional work.

Supports extensions for additional protocols, metrics, and metadata; and

What happens if we add Kafka support, where does that go?

What happens when we expose new proxy metrics (like http errors, by reason), where should that go?

Basically, we need an extensibility story.

Supports arbitrary workload hierarchies (i.e for custom operators).

You can think of a Deployment as a CRD for the deployment operator. It knows how to create ReplicaSets, which triggers it to create Pods (effectively, anyway).

People are creating other, bespoke workload types that create pods that are owned by alternate, custom resource types. We should have a story for supporting these kinds of Kubernetes extensions.

adleong · 2020-04-24T17:57:37Z

problem/0000-metrics-v1.md

+
+The metrics API is read-only.
+
+In order to support RBAC, all requests should be scoped to the resources on which metrics are


I like this mapping of RBAC onto the point at which the metrics are measured 👍

adleong · 2020-04-24T17:59:46Z

problem/0000-metrics-v1.md

+In order to support RBAC, all requests should be scoped to the resources on which metrics are
+measured.
+
+It is time-oriented. It should have both time-series and summary capabilities.


Are we sure this is a requirement? I can see this being necessary if API consumers are trying to build charting or historical lookups on top of this API, but is this something we want to support?

I think so. Even though linkerd's interfaces only expose summary data, it seems extremely desirable to expose timeseries. Otherwise, we're basically forcing all of these needs through the raw underlying prometheus interface. Though, from the impl-plan point of view I think we could avoid doing this right away, I think we should plan for the API to satisfy these needs.

I was thinking about this a little bit too. Ideally, we'd not bother and it would always be point in time. Push it back to the implementation to do time series.

Unfortunately, having this data be point in time complicates the usage. Imagine you're scraping the API and putting it in your own time-series database. Scraping on odd intervals means that you'll get imperfect data and have to reason about quite a bit in the eventual database. If we are okay requiring the backends for this API to be time-series aware, it seems like a good UX improvement to deliver some kind of time series data back to the end user.

For what it's worth, when you have a chain of federated Prometheuses, each Prometheus exposes point-in-time data to the next Prometheus in the chain which scrapes it at regular intervals. So it's not unheard of.

I'm not sure if I'm reasoning about this correctly, but I think if you have mismatched scrape intervals, the resolution of the data will effectively be the longest scrape interval. I don't think the intervals will compound in some wonky way.

That said, it seems to me like there are 3 different formats our API could expose metrics in:

summaries over a time period (as the gRPC API does today)

point-in-time snapshot of gauges/counters/histograms (ala hitting the /metrics or /federate endpoint)

timeseries (like what is returned from a promQL query)

We need to decide which of these we want to support.

@adleong That all makes sense to me.

I think that if we can support the generic time-series case, the summary case becomes ~trivial (it's a sum or rate query over the interval, right?).

Yeah, assuming the API supports this (as promQL does).

adleong · 2020-04-24T18:00:21Z

problem/0000-metrics-v1.md

+
+It is time-oriented. It should have both time-series and summary capabilities.
+
+Each Histogram should be given sufficient, published defaults.


I don't understand what this means.

The proxy's histogram windows were chosen ~arbitrarily and probably hide meaningful data. This decision needs to be revisited as we publish the API.

And, it's likely that different histograms should have different defaults. I.e. connection_duration needs very different resolution than response_latency

adleong · 2020-04-24T18:06:26Z

problem/0000-metrics-v1.md

+In order to support RBAC, all requests should be scoped to the resources on which metrics are
+measured.
+
+It is time-oriented. It should have both time-series and summary capabilities.


How much history is available from this API? How can users extend the amount of history available?

Today, users can federate Linkerd's Prometheus into their own Prometheus with a longer retention period. Is it a goal of this RFC that users would no longer federate directly from Linkerd's Prometheus? Does this API support the use-case of federating Linkerd's metrics into longer term storage?

At an API level there are no restrictions. Operators are able to tune/configure this to their needs. The API returns whatever data is available in the underlying storage. We will need to publish docs/config/tools/etc for Linkerd scrape configs that provided the metrics needed to power the API (though this is out of scope for this RFC, imo... this should focus just at the API-level)

Users today who want a retention period of longer than 6 hours federate the Linkerd prometheus into their own. If the plan is to declare that Linkerd's Prometheus is not part of its API then I think we need to figure out how to support this use-case.

e.g. Should the API support bulk extraction of metrics into an external longer-term storage? Or should Linkerd API surface area include knobs for increasing the internal retention period?

adleong · 2020-04-24T18:20:17Z

problem/0000-metrics-v1.md

+[fixable][rfc-src-metrics], however, if servers are able to discover metadata for inbound
+traffic.
+
+It seems like we should not support querying service metrics as a resource scope, though, because


I agree with this and it meshes nicely with the invariant that RBAC should be tied to the resource at which the metrics were recorded.

adleong · 2020-04-24T18:22:18Z

problem/0000-metrics-v1.md

+#### Linkerd Stat
+
+Linkerd's [`Stat`][stat] API is resource oriented, but it only exposes an _outbound_ side, so
+there's no way to


the stat API exposes inbound metrics when neither to_resource not from_resource are specified.

I find this behavior very surprising and difficult to reason about. I'm looking for ways to make the api much more explicit. The fact that we have an outbound query makes it seem like there should have inbound query options that limit inbound stats (but we don't have the labels here to satisfy these requests yet).

No arguments here :)

adleong · 2020-04-24T18:23:09Z

problem/0000-metrics-v1.md

+Linkerd's [`Stat`][stat] API is resource oriented, but it only exposes an _outbound_ side, so
+there's no way to
+
+Why is `skip_stats` part of the API?


I'm not sure. This seems to be used extensively in the Linkerd dashboard, but I don't understand the dashboard code well enough to figure out why.

adleong · 2020-04-24T18:26:08Z

problem/0000-metrics-v1.md

+
+#### Prometheus
+
+Can we expose a [Prometheus API][prom-api] scoped under each resource? E.g.


This is very appealing and plays very nicely with RBAC.

The more I think about this, the more I like it. It also makes the transition easier for clients who are currently scraping Prometheus directly.

I've got a nagging suspicion that this isn't going to work out all that well. Or that it will be wildly complex.

What happens if we have something like ../ns/linkerd/query? How does RBAC apply? Must I have permission to read metrics for each pod/resource in the namespace? Or do I require an explicit RBAC binding on the namespace itself? Am I able to write a query that spans namespaces? How would that work?

What if we don't try to enforce RBAC on prometheus's data at all?

A preferable model might be:

Prometheus scrapes themselves are RBAC'd. I.e., the proxy applies access control to its own /metrics endpoints.

The prometheus instance is the unit of view isolation. A user who can read from a prometheus instance has permission to read all of its data. If multi-tenancy is needed, users should deploy a Prometheus instance fore each tenant. RBAC is used to grant each instance permission to read metrics from the appropriate resources.

Linkerd's challenge is then:

RBAC enforcement (for /metrics & prometheus access)

When a user interacts with Linkerd UIs how do we use the prometheus instance the user has access to?

Perhaps we make it easy to deploy a subset of the control plane per tenant? There's no good reason the web instance needs to be centralized in this world.

We publish a prometheus "schema" and scrape configs...

Basically: "Bring your own prometheus" should support arbitrary numbers of prometheus instances. Our users who really truly want to dump all of their data into one huge prometheus instance don't get view isolation (except by limiting who can read data from their monoprom).

adleong · 2020-04-24T18:27:49Z

problem/0000-metrics-v1.md

+implicit, undocumented relationship between the proxy's metrics exposition structure and traffic
+in Kubernetes.
+
+The goal of this RFC is to define an external-facing API for Linkerd that:


Is it a goal of this RFC to add access control to Prometheus? Should accessing Prometheus directly be an available but unsupported/discouraged API? Or should it be locked down in some way?

I think we want to provide guidance/tools that allows folks to lock down their prometheus. I think we should consider locking down our own prometheus as we grow those capabilities. But I don't think it's desirable that we implement access control by default. Not yet, anyway.

Really, what I'm trying to say in this part is: as we look at our API options, let's ensure that we can satisfy at least these needs. The impl plan will, of course, order this incrementally.

olix0r · 2020-04-24T19:32:42Z

problem/0000-metrics-v1.md

+
+- Open Count, by
+  - Interface: `inbound | outbound`
+  - Workload


I wonder if it makes sense to expose Node information as well? To help correlate node issues

grampelberg · 2020-04-24T19:54:47Z

problem/0000-metrics-v1.md

+4. Supports Kubernetes RBAC;
+5. Supports extensions for additional protocols, metrics, and metadata; and
+6. Supports arbitrary workload hierarchies (i.e for custom operators).
+


Couple questions around requirements:

Should we add a requirement around the query interface? I've been thinking about what kinds of queries users will want to run (the linkerd UI is a great place to start there).

Should there be a requirement around published clients? Having an API is great, publishing a client in your language of choice is way easier to use and consume.

What kind of assumptions are we going to make about the backend data store? Should it be pluggable?

I'm mostly trying to keep this at doc at the interface-level, but I will dig in more on query interface.

grampelberg · 2020-04-24T19:56:21Z

problem/0000-metrics-v1.md

+
+The metrics API is read-only.
+
+In order to support RBAC, all requests should be scoped to the resources on which metrics are


Do we want to be explicit about the RBAC policies? It could really run the gamut. I suspect that almost anything we do can be improved in the future, so it is more of an exercise around "out of scope" at this point.

Yeah, the design should provide an initial RBAC model. I don't think we should get into going so granular as to which metrics can be read... You can either read metrics for a pod or you can't.

grampelberg · 2020-04-24T20:46:32Z

problem/0000-metrics-v1.md

+In order to support RBAC, all requests should be scoped to the resources on which metrics are
+measured.
+
+It is time-oriented. It should have both time-series and summary capabilities.


I was thinking about this a little bit too. Ideally, we'd not bother and it would always be point in time. Push it back to the implementation to do time series.

Unfortunately, having this data be point in time complicates the usage. Imagine you're scraping the API and putting it in your own time-series database. Scraping on odd intervals means that you'll get imperfect data and have to reason about quite a bit in the eventual database. If we are okay requiring the backends for this API to be time-series aware, it seems like a good UX improvement to deliver some kind of time series data back to the end user.

grampelberg · 2020-04-24T21:32:26Z

problem/0000-metrics-v1.md

+
+- Open Count, by
+  - Interface: `inbound | outbound`
+  - Workload


In a perfect world, we'd get labels for everything recursively from the pod up to the final parent.

grampelberg · 2020-04-24T21:32:49Z

problem/0000-metrics-v1.md

+  - Interface: `inbound | outbound`
+  - Workload
+  - Service
+  - Service profile route


How would we have routes here?

I need to think some more about routes & retries... I'd like to eliminate some redundancy in how we export these metrics.

grampelberg · 2020-04-24T21:40:44Z

problem/0000-metrics-v1.md

+
+##### gRPC
+
+How do we represent the gRPC-specific metadata that Linkerd exposes? It seems like it should be


This is a great opportunity to flesh out what metrics for another protocol would look like.

grampelberg · 2020-04-24T21:41:22Z

problem/0000-metrics-v1.md

+distinct from (but in addition to) the HTTP-level abstractions? Or should HTTP metrics have a
+hole to punch through arbitrary metadata?
+
+Should we provide gRPC stats scoped by gRPC service name? We can infer these via known path


Do we have any cardinality concerns with gRPC? Would it be possible to just collect per-method metrics by default?

I feel like this is generally small cardinality, but we could always put explicit limits in...

grampelberg · 2020-04-24T21:42:54Z

problem/0000-metrics-v1.md

+
+- Request count, by
+  - Interface: `inbound | outbound`
+  - Authority


Do we want to differentiate between the real authority and the normalized one? Feels like it'd be valuable to lookup foo.com as an authority even though it ended up at the normalized foo.default.svc.cluster.local.

I don't have a good mental model for thinking about this yet. We need a clearer idea of what resources express these mappings...

grampelberg · 2020-04-24T21:44:38Z

problem/0000-metrics-v1.md

+
+#### SMI
+
+[SMI's metrics][smi-spec] appear to be stringly typed, in that their are a few well-defined


strongly?

grampelberg · 2020-04-24T21:46:03Z

problem/0000-metrics-v1.md

+
+#### Prometheus
+
+Can we expose a [Prometheus API][prom-api] scoped under each resource? E.g.


I'm wondering if we can get everything we want from recording rules. That provides the stable API and solves some other issues. We'd still need to parse and such, but the surface area would be shrunk.

Recording rules are probably a good way to try this, yeah.

But, I think there are opportunities for us to be a bit clever about our storage/query patterns to take advantage of our understanding of the data model.

For example: Can we reduce the number of time-series that have to be stored/in memory by only storing the pod name in a metric label? What if, instead, we wrote a single pod{name="",parent_kind="...",parent_name="...",label_foo="...",label_bar="...} 1. If you want to do a deploy-specific lookup, instead of looking up a request_total{deployment="foo"}, we would first expand the list of pods for the deployment and look up by {pod=".."}...

This is, of course, totally hypothetical and we'd want to dig on access patterns before optimizing this way. But my point is this plan gives us enough room to get creative...

olix0r · 2020-05-05T21:01:56Z

@adleong @grampelberg I dropped all of the design notes and just left this with a simpler problem statement. Please ACK if you're good with this. I'll proceed with the meat of the proposal.

grampelberg

LGTM

olix0r · 2020-09-10T19:18:46Z

This was accidentally closed when we changed the default branch to main. The base should probably be updated...

Propose a stable metrics API

53560e6

This is a rough sketch outlining some of the properties and constraints of a stable Linkerd metrics API.

olix0r requested review from admc, grampelberg and adleong April 24, 2020 16:01

adleong reviewed Apr 24, 2020

View reviewed changes

olix0r commented Apr 24, 2020

View reviewed changes

grampelberg reviewed Apr 24, 2020

View reviewed changes

Tighten up problem statement before proposal

781b346

grampelberg approved these changes May 5, 2020

View reviewed changes

olix0r mentioned this pull request May 18, 2020

Duration metrics not following Prometheus best practices linkerd/linkerd2#4414

Open

olix0r closed this Sep 10, 2020

olix0r reopened this Sep 10, 2020

olix0r closed this Oct 21, 2022


		The metrics API is read-only.

		In order to support RBAC, all requests should be scoped to the resources on which metrics are


		It is time-oriented. It should have both time-series and summary capabilities.

		Each Histogram should be given sufficient, published defaults.


		#### Prometheus

		Can we expose a [Prometheus API][prom-api] scoped under each resource? E.g.


		##### gRPC

		How do we represent the gRPC-specific metadata that Linkerd exposes? It seems like it should be


		#### SMI

		[SMI's metrics][smi-spec] appear to be stringly typed, in that their are a few well-defined

Propose a stable metrics API #21

Propose a stable metrics API #21

Conversation

olix0r commented Apr 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olix0r commented May 5, 2020

grampelberg left a comment

Choose a reason for hiding this comment

olix0r commented Sep 10, 2020