linkerd · adleong · Apr 15, 2020 · Apr 21, 2020 · Apr 22, 2020 · Apr 23, 2020
diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
@@ -0,0 +1,200 @@
+- Contribution Name: Source Metadata for Traffic Metrics
+- Implementation Owner: Alex Leong
+- Start Date: 2020-04-14
+- Target Date: 
+- RFC PR: 
+- Linkerd Issue: 
+- Reviewers: 
+
+# Summary
+
+[summary]: #summary
+
+When a meshed pod sends outgoing HTTP requests the Linkerd proxy records metrics
+such as latency and request counters and scopes those metrics by the
+destination.  This is done by setting a `dst_X` label on the Prometheus metrics
+where `X` is the destination workload kind.  On the other hand, when a meshed
+pod receives incoming HTTP requests, there is no equivalent scoping of the
+metrics by source.  In other words, there is no `src_X` Prometheus label, making
+it impossible to break down metrics for incoming traffic by source.  This RFC
+proposes adding `src_X` Prometheus labels all meshed HTTP traffic.
+
+# Problem Statement (Step 1)
+
+[problem-statement]: #problem-statement
+
+Linkerd is not able to add `src_X` labels today because it simply has no
+knowledge of the source resource.  It knows the peer socket address of the
+source, but has no mechanism to convert that address into a Kubernetes resource
+name or type.  This is in contrast to the `dst_X` metadata which the proxy
+gets from the Destination controller when doing service discovery look-ups.
+
+This asymmetry in metadata can be very limiting when doing queries.  It is
+impossible to determine who the clients of a resource are by looking at that
+resource's metrics alone.  Instead, we need to query the outbound metrics of all
+other resource to find a client with the appropriate `dst_X` label.  Not only
+does this make the query awkward, it also means that resource-to-resource
+metrics can only be observed on the client side, never on the server side.  This
+limits our ability to measure network latency.
+
+More specifically, here are some examples of questions that cannot be answered
+without introducing source metadata:
+
+* We cannot compare client-side and server-side metrics for the same traffic to
+  identify latency or errors introduced between the two Linkerd proxies (e.g. by
+  the network or by other intermediary proxies)
+* It is difficult to present traffic metrics in a consistent way: top line
+  resource metrics are measured on the server-side by default, but in order to
+  view a breakdown of these metrics by source, the absesnse of source metadata
+  means that we have to switch to displaying client-side metrics.  This is
+  confusing at best and misleading at worst.
+* For traffic from unmeshed sources, the problem is even worse.  In this case
+  we don't have client-side metrics at all and can't display any metrics for
+  this traffic.  Introducing source metadata would allow us to distinguish
+  between meshed and unmeshed traffic on the server side.  While we would not
+  be able to distinguish between different unmeshed sources, we would at least
+  be able to show metrics for traffic from all unmeshed sources aggregated
+  together.
+
+Adding source metadata to HTTP traffic metrics would enable improvements in the
+Linkerd Grafana dashboard, 3rd party tools that consume Linkerd's Prometheus
+metrics, the controller's StatSummary API, and consequently the `linkerd stat`
+CLI command and Linkerd dashboard.  These improvements are out of the scope of
+this proposal.
+
+# Design proposal (Step 2)
+
+[design-proposal]: #design-proposal
+
+We will use the [Downward
+API](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#the-downward-api)
+to create and mount a volume in meshed pods which contains a file with the pod
+owner's resource name and kind.  This is very similar to [the approach used to
+populate pod labels on span
+annotations](https://github.com/linkerd/linkerd2/pull/4199).  These values can
+be extracted from pod labels such as `linkerd.io/proxy-deployment`,
+`linkerd.io/proxy-daemonset`, etc.  We can create a simple startup script in the
+proxy container to preprocess these pod labels into the labels that we would
+like the proxy to use.  For example, the Downward API would create a volume with
+a file containing:
+
+```
+app="web-svc"
+linkerd.io/control-plane-ns="linkerd"
+linkerd.io/proxy-deployment="web"
+pod-template-hash="5cb99f85d8"
+```
+
+Our startup script would rewrite this to keep only the labels that we want and
+to put them in the desired format:
+
+```
+resource_name="web"
+resource_kind="deployment"
+```
+
+The proxy would read this file to get a list of labels to apply to its traffic
+metrics.  The way the proxy uses these labels differs for the inbound and
+outbound stacks.
+
+For the inbound stack, the proxy will prepend the `dst_` prefix to these labels
+and use them to scope the inbound metrics.  This will result in metrics such as:
+
+```
+request_total{
+  direction="inbound",
+  dst_resource_name="web",
+  dst_resource_kind="deployment",
+  ...
+}
+```
+
+The `dst_` prefix is used like this because on the inbound side, the destination
+is the local resource.  Source labels are also added to the inbound metrics but
+we'll come back to that in a moment.
+
+For the outbound stack, the proxy will prepend the `src_` prefix to the labels
+and use them to scope the outbound metrics.  The corresponding `dst_` labels
+will be populated by the dst metadata from the destination controller.  This
+will result in metrics such as:
+
+```
+request_total{
+  direction="outbound",
+  src_resource_name="web",
+  src_resource_kind="deployment",
+  dst_resource_name="emoji",
+  dst_resource_kind="deployment",
+  ...
+}
+```
+
+The outbound stack will also share the source labels with the inbound stack of
+the destination proxy.  Remember when we said we'd come back to inbound?
+
+In addition to populating the `dst_` labels, the inbound stack will also use the
+source labels, prepend `src_` to them, and add them to the label scope.
+Thus, the complete inbound metrics would actually look like:
+
+
+```
+request_total{
+  direction="inbound",
+  dst_resource_name="web",
+  dst_resource_kind="deployment",
+  src_resource_name="vote-bot",
+  src_resource_kind="deployment",
+}
+```
+
+The details of how the source labels will be shared between the source outbound
+proxy and the destination inbound proxy are out-of-scope of this RFC.
+
+Note that all of the changes described here are additive to the existing
+Prometheus labels and would not introduce any backwards incompatibility.
+
+This change does increase the cardinality of Prometheus time-series since
+inbound metrics will now be scoped by source resource.  This will roughly bring
+the cardinality of the inbound metrics to the same as the outbound metrics.
+
+Note also that this implementation means that source metadata will NOT be
+available when the source resource is not meshed.
+
+There are no known blockers or prerequisites before this work can be started.
+
+# Prior art
+
+[prior-art]: #prior-art
+
+The Istio mixer telemetry architecture showcases a different approach to
+populating source metadata.  In that architecture, rather than having data plane
+proxies expose metrics to Prometheus directly, the metrics are ingested by the
+control plane and post-processed.  Source IP addresses are resolved to source
+resources before being stored.
+
+It would be difficult to incorporate this approach in Linkerd without
+introducing significant complexity.  The fact that Prometheus is able to scrape
+Linkerd proxies directly is a great property.  Alternatively, having the Linkerd
+proxy resolve IP addresses to resources would either require an API call which
+would introduce latency in the critical data path or would require an in-process
+cache which would increase the proxy memory footprint.
+
+In Istio configuration which do not use mixer, source context is communicated
+through a base64 encoded map of key-value pairs which in an HTTP header.  This
+approach requires reading and decoding this header on a per-request basis, even
+though we know that the source metadata will be the same for all requests on a
+single connection.  By communicating source metadata at the connection level,
+we can avoid doing this work for each request.
+
+# Unresolved questions
+
+[unresolved-questions]: #unresolved-questions
+
+The details of how to share source metadata will be discussed in another RFC.
+
+# Future possibilities
+
+[future-possibilities]: #future-possibilities
+
+Make use of these new metrics in the Grafana dashboards, StatSummary API, Linkerd
+CLI and Linkerd dashboard.