From 0561c40597c75dcbb60f99eed4e35b1e2ea88582 Mon Sep 17 00:00:00 2001
From: Alex Leong <alex@buoyant.io>
Date: Tue, 14 Apr 2020 17:13:27 -0700
Subject: [PATCH 1/5] Add source metadata RFC

Signed-off-by: Alex Leong <alex@buoyant.io>
---
 design/0003-source-metadata.md | 182 +++++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)
 create mode 100644 design/0003-source-metadata.md

diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
new file mode 100644
index 0000000..06e88bd
--- /dev/null
+++ b/design/0003-source-metadata.md
@@ -0,0 +1,182 @@
+- Contribution Name: Source Metadata for Traffic Metrics
+- Implementation Owner: Alex Leong
+- Start Date: 2020-04-14
+- Target Date: 
+- RFC PR: 
+- Linkerd Issue: 
+- Reviewers: 
+
+# Summary
+
+[summary]: #summary
+
+When a meshed pod sends outgoing HTTP requests the Linkerd proxy records metrics
+such as latency and request counters and scopes those metrics by the
+destination.  This is done by setting a `dst_X` label on the Prometheus metrics
+where `X` is the destination workload kind.  On the other hand, when a meshed
+pod receives incoming HTTP requests, there is no equivalent scoping of the
+metrics by source.  In other words, there is no `src_X` Prometheus label, making
+it impossible to break down metrics for incoming traffic by source.  This RFC
+proposes adding `src_X` Prometheus labels for incoming HTTP traffic.
+
+# Problem Statement (Step 1)
+
+[problem-statement]: #problem-statement
+
+Linkerd is not able to add `src_X` labels today because it simply has no
+knowledge of the source resource.  It knows the peer socket address of the
+source, but has no mechanism to convert that address into a Kubernetes resource
+name or type.  This is in contrast to the `dst_X` metadata which the proxy
+gets from the Destination controller when doing service discovery look-ups.
+
+This asymmetry in metadata can be very limiting when doing queries.  It is
+impossible to determine who the clients of a resource are by looking at that
+resource's metrics alone.  Instead, we need to query the outbound metrics of all
+other resource to find a client with the appropriate `dst_X` label.  Not only
+does this make the query awkward, it also means that resource-to-resource
+metrics can only be observed on the client side, never on the server side.  This
+limits our ability to measure network latency.
+
+Adding source metadata to HTTP traffic metrics would enable improvements in the
+Linkerd Grafana dashboard, 3rd party tools that consume Linkerd's Prometheus
+metrics, the controller's StatSummary API, and consequently the `linkerd stat`
+CLI command and Linkerd dashboard.  These improvements are out of the scope of
+this proposal.
+
+# Design proposal (Step 2)
+
+[design-proposal]: #design-proposal
+
+We will use the [Downward
+API](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#the-downward-api)
+to create and mound a volume in meshed pods which contains a file with the pod
+owner's resource name and kind.  This is very similar to [the approach used to
+populate pod labels on span
+annotations](https://github.com/linkerd/linkerd2/pull/4199).  These values can
+be extracted from pod labels such as `linkerd.io/proxy-deployment`,
+`linkerd.io/proxy-daemonset`, etc.  We can create a simple startup script in the
+proxy container to preprocess these pod labels into the labels that we would
+like the proxy to use.  For example, the Downward API would create a volume with
+a file containing:
+
+```
+app="web-svc"
+linkerd.io/control-plane-ns="linkerd"
+linkerd.io/proxy-deployment="web"
+pod-template-hash="5cb99f85d8"
+```
+
+Our startup script would rewrite this to keep only the labels that we want and
+to put them in the desired format:
+
+```
+resource_name="web"
+resource_kind="deployment"
+```
+
+The proxy would read this file to get a list of labels to apply to its traffic
+metrics.  The way the proxy uses these labels differs for the inbound and
+outbound stacks.
+
+
+For the inbound stack, the proxy will prepend the `dst_` prefix to these labels
+and use them to scope the inbound metrics.  This will result in metrics such as:
+
+```
+request_total{
+  direction="inbound",
+  dst_resource_name="web",
+  dst_resource_kind="deployment",
+  ...
+}
+```
+
+The `dst_` prefix is used like this because on the inbound side, the destination
+is the local resource.  Source labels are also added to the inbound metrics but
+we'll come back to that in a moment.
+
+For the outbound stack, the proxy will prepend the `src_` prefix to the labels
+and use them to scope the outbound metrics.  The corresponding `dst_` labels
+will be populated by the dst metadata from the destination controller.  This
+will result in metrics such as:
+
+```
+request_total{
+  direction="outbound",
+  src_resource_name="web",
+  src_resource_kind="deployment",
+  dst_resource_name="emoji",
+  dst_resource_kind="deployment",
+  ...
+}
+```
+
+The outbound stack will also encode the source labels in an HTTP header of the
+outgoing request called `l5d-src-labels`.  An example of the value of this
+header would be: `resource_name=web,resource_kind=deployment`.  Encoding this
+source metadata on the request allows it to used by the inbound stack of the
+destination proxy.  Remember when we said we'd come back to inbound?
+
+In addition to populating the `dst_` labels, the inbound stack will also read
+the `l5d-src-labels` HTTP header from the request, prepend `src_` to them, and
+add them to the label scope.  Thus, the complete inbound metrics would actually
+look like:
+
+
+```
+request_total{
+  direction="inbound",
+  dst_resource_name="web",
+  dst_resource_kind="deployment",
+  src_resource_name="vote-bot",
+  src_resource_kind="deployment",
+}
+```
+
+Note that all of the changes described here are additive to the existing
+Prometheus labels and would not introduce any backwards incompatibility.
+
+This change does increase the cardinality of Prometheus time-series since
+inbound metrics will now be scoped by source resource.  This will roughly bring
+the cardinality of the inbound metrics to the same as the outbound metrics.
+
+Note also that this implementation means that source metadata will NOT be
+available when the source resource is not meshed.
+
+There are no known blockers or prerequisites before this work can be started.
+
+# Prior art
+
+[prior-art]: #prior-art
+
+The Istio mixer telemetry architecture showcases a different approach to
+populating source metadata.  In that architecture, rather than having data plane
+proxies expose metrics to Prometheus directly, the metrics are ingested by the
+control plane and post-processed.  Source IP addresses are resolved to source
+resources before being stored.
+
+It would be difficult to incorporate this approach in Linkerd without
+introducing significant complexity.  The fact that Prometheus is able to scrape
+Linkerd proxies directly is a great property.  Alternatively, having the Linkerd
+proxy resolve IP addresses to resources would either require an API call which
+would introduce latency in the critical data path or would require an in-process
+cache which would increase the proxy memory footprint.
+
+There is also precedent in Linkerd for proxies to communicate metadata to one
+another using the `l5d-*` headers.  For example, the fully qualified authority
+is communicated between proxies using the `l5d-dst-canonical` header.
+
+# Unresolved questions
+
+[unresolved-questions]: #unresolved-questions
+
+Existing metrics stack modules in the proxy have a fixed label scope per stack.
+However, this proposal describes labels set dynamically from the incoming
+request. Is this feasible in the proxy stack architecture?
+
+# Future possibilities
+
+[future-possibilities]: #future-possibilities
+
+Make use of these new metrics in the Grafana dashboards, StatSummary API, Linkerd
+CLI and Linkerd dashboard.

From bef860dcbac5e9c471e9e3fec725d560b826b038 Mon Sep 17 00:00:00 2001
From: Alex Leong <alex@buoyant.io>
Date: Tue, 21 Apr 2020 14:30:34 -0700
Subject: [PATCH 2/5] Move context sharing out of scope

Signed-off-by: Alex Leong <alex@buoyant.io>
---
 design/0003-source-metadata.md | 35 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
index 06e88bd..ae407c8 100644
--- a/design/0003-source-metadata.md
+++ b/design/0003-source-metadata.md
@@ -17,7 +17,7 @@ where `X` is the destination workload kind.  On the other hand, when a meshed
 pod receives incoming HTTP requests, there is no equivalent scoping of the
 metrics by source.  In other words, there is no `src_X` Prometheus label, making
 it impossible to break down metrics for incoming traffic by source.  This RFC
-proposes adding `src_X` Prometheus labels for incoming HTTP traffic.
+proposes adding `src_X` Prometheus labels all meshed HTTP traffic.
 
 # Problem Statement (Step 1)
 
@@ -49,7 +49,7 @@ this proposal.
 
 We will use the [Downward
 API](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#the-downward-api)
-to create and mound a volume in meshed pods which contains a file with the pod
+to create and mount a volume in meshed pods which contains a file with the pod
 owner's resource name and kind.  This is very similar to [the approach used to
 populate pod labels on span
 annotations](https://github.com/linkerd/linkerd2/pull/4199).  These values can
@@ -78,7 +78,6 @@ The proxy would read this file to get a list of labels to apply to its traffic
 metrics.  The way the proxy uses these labels differs for the inbound and
 outbound stacks.
 
-
 For the inbound stack, the proxy will prepend the `dst_` prefix to these labels
 and use them to scope the inbound metrics.  This will result in metrics such as:
 
@@ -111,16 +110,12 @@ request_total{
 }
 ```
 
-The outbound stack will also encode the source labels in an HTTP header of the
-outgoing request called `l5d-src-labels`.  An example of the value of this
-header would be: `resource_name=web,resource_kind=deployment`.  Encoding this
-source metadata on the request allows it to used by the inbound stack of the
-destination proxy.  Remember when we said we'd come back to inbound?
+The outbound stack will also share the source labels with the inbound stack of
+the destination proxy.  Remember when we said we'd come back to inbound?
 
-In addition to populating the `dst_` labels, the inbound stack will also read
-the `l5d-src-labels` HTTP header from the request, prepend `src_` to them, and
-add them to the label scope.  Thus, the complete inbound metrics would actually
-look like:
+In addition to populating the `dst_` labels, the inbound stack will also use the
+source labels, prepend `src_` to them, and add them to the label scope.
+Thus, the complete inbound metrics would actually look like:
 
 
 ```
@@ -133,6 +128,9 @@ request_total{
 }
 ```
 
+The details of how the source labels will be shared between the source outbound
+proxy and the destination inbound proxy are out-of-scope of this RFC.
+
 Note that all of the changes described here are additive to the existing
 Prometheus labels and would not introduce any backwards incompatibility.
 
@@ -162,17 +160,18 @@ proxy resolve IP addresses to resources would either require an API call which
 would introduce latency in the critical data path or would require an in-process
 cache which would increase the proxy memory footprint.
 
-There is also precedent in Linkerd for proxies to communicate metadata to one
-another using the `l5d-*` headers.  For example, the fully qualified authority
-is communicated between proxies using the `l5d-dst-canonical` header.
+In Istio configuration which do not use mixer, source context is communicated
+through a base64 encoded map of key-value pairs which in an HTTP header.  This
+approach requires reading and decoding this header on a per-request basis, even
+though we know that the source metadata will be the same for all requests on a
+single connection.  By communicating source metadata at the connection level,
+we can avoid doing this work for each request.
 
 # Unresolved questions
 
 [unresolved-questions]: #unresolved-questions
 
-Existing metrics stack modules in the proxy have a fixed label scope per stack.
-However, this proposal describes labels set dynamically from the incoming
-request. Is this feasible in the proxy stack architecture?
+The details of how to share source metadata will be discussed in another RFC.
 
 # Future possibilities
 

From beac023b91b59bb99e2438a604532c30785dffcf Mon Sep 17 00:00:00 2001
From: Alex Leong <alex@buoyant.io>
Date: Wed, 22 Apr 2020 14:36:23 -0700
Subject: [PATCH 3/5] Add examples of queries we cannot answer today

Signed-off-by: Alex Leong <alex@buoyant.io>
---
 design/0003-source-metadata.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
index ae407c8..5008994 100644
--- a/design/0003-source-metadata.md
+++ b/design/0003-source-metadata.md
@@ -37,6 +37,25 @@ does this make the query awkward, it also means that resource-to-resource
 metrics can only be observed on the client side, never on the server side.  This
 limits our ability to measure network latency.
 
+More specifically, here are some examples of questions that cannot be answered
+without introducing source metadata:
+
+* We cannot compare client-side and server-side metrics for the same traffic to
+  identify latency or errors introduced between the two Linkerd proxies (e.g. by
+  the network or by other intermediary proxies)
+* It is difficult to present traffic metrics in a consistent way: top line
+  resource metrics are measured on the server-side by default, but in order to
+  view a breakdown of these metrics by source, the absesnse of source metadata
+  means that we have to switch to displaying client-side metrics.  This is
+  confusing at best and misleading at worst.
+* For traffic from unmeshed sources, the problem is even worse.  In this case
+  we don't have client-side metrics at all and can't display any metrics for
+  this traffic.  Introducing source metadata would allow us to distinguish
+  between meshed and unmeshed traffic on the server side.  While we would not
+  be able to distinguish between different unmeshed sources, we would at least
+  be able to show metrics for traffic from all unmeshed sources aggregated
+  together.
+
 Adding source metadata to HTTP traffic metrics would enable improvements in the
 Linkerd Grafana dashboard, 3rd party tools that consume Linkerd's Prometheus
 metrics, the controller's StatSummary API, and consequently the `linkerd stat`

From c918aa90a92d6a7183fdc61b522ff83d6e04734f Mon Sep 17 00:00:00 2001
From: Alex Leong <alex@buoyant.io>
Date: Thu, 23 Apr 2020 14:17:21 -0700
Subject: [PATCH 4/5] Remove point about identifying non-meshed sources

Signed-off-by: Alex Leong <alex@buoyant.io>
---
 design/0003-source-metadata.md | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
index 5008994..3b90810 100644
--- a/design/0003-source-metadata.md
+++ b/design/0003-source-metadata.md
@@ -48,13 +48,6 @@ without introducing source metadata:
   view a breakdown of these metrics by source, the absesnse of source metadata
   means that we have to switch to displaying client-side metrics.  This is
   confusing at best and misleading at worst.
-* For traffic from unmeshed sources, the problem is even worse.  In this case
-  we don't have client-side metrics at all and can't display any metrics for
-  this traffic.  Introducing source metadata would allow us to distinguish
-  between meshed and unmeshed traffic on the server side.  While we would not
-  be able to distinguish between different unmeshed sources, we would at least
-  be able to show metrics for traffic from all unmeshed sources aggregated
-  together.
 
 Adding source metadata to HTTP traffic metrics would enable improvements in the
 Linkerd Grafana dashboard, 3rd party tools that consume Linkerd's Prometheus

From d15afc0d454d18e38681363476114134d3e91f85 Mon Sep 17 00:00:00 2001
From: Alex Leong <alex@buoyant.io>
Date: Thu, 23 Apr 2020 16:50:45 -0700
Subject: [PATCH 5/5] Rework RFC to be metrics API focused

Signed-off-by: Alex Leong <alex@buoyant.io>
---
 design/0003-source-metadata.md | 125 +++++++++++++++++++++------------
 1 file changed, 80 insertions(+), 45 deletions(-)

diff --git a/design/0003-source-metadata.md b/design/0003-source-metadata.md
index 3b90810..a5ad5e2 100644
--- a/design/0003-source-metadata.md
+++ b/design/0003-source-metadata.md
@@ -10,55 +10,91 @@
 
 [summary]: #summary
 
-When a meshed pod sends outgoing HTTP requests the Linkerd proxy records metrics
-such as latency and request counters and scopes those metrics by the
-destination.  This is done by setting a `dst_X` label on the Prometheus metrics
-where `X` is the destination workload kind.  On the other hand, when a meshed
-pod receives incoming HTTP requests, there is no equivalent scoping of the
-metrics by source.  In other words, there is no `src_X` Prometheus label, making
-it impossible to break down metrics for incoming traffic by source.  This RFC
-proposes adding `src_X` Prometheus labels all meshed HTTP traffic.
+Linkerd's metrics API lacks the ability to query for server-side metrics when
+doing a resource-to-resource query.  When metrics for a single resource are
+requested, the returned metrics are always measured on the server-side.  But
+when metrics for traffic between two resources is requested, the returned
+metrics are always measured on the client-side.  This behavior is both
+unintuitive and limiting.  Users are frequently unaware or surprised that these
+two types of queries are measured differently.  Without any way to measure
+resource-to-resource traffic metrics on the server-side, it is impossible to
+compare client-side to server-side metrics for this type of traffic to identify
+network introduced latency or errors.
 
 # Problem Statement (Step 1)
 
 [problem-statement]: #problem-statement
 
-Linkerd is not able to add `src_X` labels today because it simply has no
-knowledge of the source resource.  It knows the peer socket address of the
-source, but has no mechanism to convert that address into a Kubernetes resource
-name or type.  This is in contrast to the `dst_X` metadata which the proxy
-gets from the Destination controller when doing service discovery look-ups.
-
-This asymmetry in metadata can be very limiting when doing queries.  It is
-impossible to determine who the clients of a resource are by looking at that
-resource's metrics alone.  Instead, we need to query the outbound metrics of all
-other resource to find a client with the appropriate `dst_X` label.  Not only
-does this make the query awkward, it also means that resource-to-resource
-metrics can only be observed on the client side, never on the server side.  This
-limits our ability to measure network latency.
-
-More specifically, here are some examples of questions that cannot be answered
-without introducing source metadata:
-
-* We cannot compare client-side and server-side metrics for the same traffic to
-  identify latency or errors introduced between the two Linkerd proxies (e.g. by
-  the network or by other intermediary proxies)
-* It is difficult to present traffic metrics in a consistent way: top line
-  resource metrics are measured on the server-side by default, but in order to
-  view a breakdown of these metrics by source, the absesnse of source metadata
-  means that we have to switch to displaying client-side metrics.  This is
-  confusing at best and misleading at worst.
-
-Adding source metadata to HTTP traffic metrics would enable improvements in the
-Linkerd Grafana dashboard, 3rd party tools that consume Linkerd's Prometheus
-metrics, the controller's StatSummary API, and consequently the `linkerd stat`
-CLI command and Linkerd dashboard.  These improvements are out of the scope of
-this proposal.
+Linkerd's metrics API requests have this structure:
+
+```
+message StatSummaryRequest {
+  ResourceSelection selector = 1;
+  string time_window = 2;
+
+  oneof outbound {
+    Empty none = 3;
+    Resource to_resource   = 4;
+    Resource from_resource = 5;
+  }
+
+  bool skip_stats = 6;  // true if we want to skip stats from Prometheus
+  bool tcp_stats = 7;
+}
+```
+
+If the `outbound` field is set to `none` then metrics are measured on the
+server-side of the `selector` resource.  However, if the `outbound` field is
+set to `to_resource` then the metrics are measured on the client-side of the
+`selector` resource.  Finally, if the `outbound` field is set to `from_resource`
+then the metrics are measured on the client-side of the `from_resource`.
+
+This API is confusing for a few reasons:
+
+* Some types of queries are measured on the server-side while others are
+  measured on the client-side.
+* Some types of queries are measured from the `selector` resource while others
+  are not.
+
+More importantly, some types of queries are not possible: it is not possible to
+query for resource-to-resource traffic measured on the server-side.  This
+limitation is significant because it means that is it is impossible to compare
+client-side to server-side metrics for this type of traffic to identify network
+introduced latency or errors.
 
 # Design proposal (Step 2)
 
 [design-proposal]: #design-proposal
 
+We will change the semantics of the StatSummary API to behave in a more
+predictable and consistent way.  Specifically, we will rename the `outbound`
+field to `edge` and change the semantics to be that traffic is always measured
+at the `selector` resource.  This means that when `edge` is set to `none`,
+traffic will be measured on the server-side of the selector resource (no change
+from today's behavior), when `edge` is set to `to_resource`, traffic is measured
+on the client-side of the `selector` resource (no change from today's behavior),
+and when `edge` is set to `from_resource`, traffic is measured on the
+server-side of the `selector` resource.
+
+An alternative to modifying the semantics of the StatSummary API is to create a
+new metrics API that would eventually replace StatSummary.  This has the added
+benefit of giving us the opportunity to simplify this API by allowing us to
+drop support for features which are not core to traffic metrics such as meshed
+pod count, as well as moving traffic split metrics into its own command.  This
+approach may also avoid unpredictable behavior when using mismatched CLI and
+control plane versions.  However, building a new API would require a larger
+effort than tweaking the existing one.
+
+In order to satisfy the new semantics, our Prometheus data must be rich enough
+to be able to select traffic metrics from specific sources when measuring on
+the server (inbound) side.  The inbound proxy does not attach any label to its
+metrics that allow selection by traffic source.  This is because it simply has
+no knowledge of the source resource. It knows the peer socket address of the
+source, but has no mechanism to convert that address into a Kubernetes resource
+name or type. This is in contrast to the `dst_X` metadata which the proxy gets
+from the Destination controller when doing service discovery look-ups.  We will
+add a corresponding `src_X` label that identifies the source resource.
+
 We will use the [Downward
 API](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#the-downward-api)
 to create and mount a volume in meshed pods which contains a file with the pod
@@ -141,7 +177,8 @@ request_total{
 ```
 
 The details of how the source labels will be shared between the source outbound
-proxy and the destination inbound proxy are out-of-scope of this RFC.
+proxy and the destination inbound proxy are out-of-scope of this RFC but are
+discussed in the [Context Sharing RFC](https://github.com/linkerd/rfc/pull/20).
 
 Note that all of the changes described here are additive to the existing
 Prometheus labels and would not introduce any backwards incompatibility.
@@ -153,8 +190,6 @@ the cardinality of the inbound metrics to the same as the outbound metrics.
 Note also that this implementation means that source metadata will NOT be
 available when the source resource is not meshed.
 
-There are no known blockers or prerequisites before this work can be started.
-
 # Prior art
 
 [prior-art]: #prior-art
@@ -183,11 +218,11 @@ we can avoid doing this work for each request.
 
 [unresolved-questions]: #unresolved-questions
 
-The details of how to share source metadata will be discussed in another RFC.
+The details of how to share source metadata will be discussed in 
+[another RFC](https://github.com/linkerd/rfc/pull/20).
 
 # Future possibilities
 
 [future-possibilities]: #future-possibilities
 
-Make use of these new metrics in the Grafana dashboards, StatSummary API, Linkerd
-CLI and Linkerd dashboard.
+Make use of this new metrics API in the Linkerd CLI and dashboard.