Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the set of labels consistent within each metric exported by Linkerd #4102

Open
siggy opened this issue Feb 26, 2020 · 14 comments
Open

Make the set of labels consistent within each metric exported by Linkerd #4102

siggy opened this issue Feb 26, 2020 · 14 comments
Labels

Comments

@siggy
Copy link
Member

siggy commented Feb 26, 2020

Feature Request

What problem are you trying to solve?

Make the set of labels consistent within each metric exported by Linkerd.

Per the Prometheus doc, a single metric should always be exported with the same set of labels.

Prometheus' client_golang enforces this requirement in prometheus.Registry:
https://github.com/prometheus/client_golang/blob/673e4a177a5816096fb34777d8006fd138dbab2f/prometheus/desc.go#L35-L39
To put it another way, the set of metrics and labels exported by the Linkerd proxy would not be permitted in client_golang, for example:

response_total{
  namespace="linkerd"
  deployment="linkerd-controller",
}
response_total{
  namespace="linkerd"
  statefulset="linkerd-controller",
}

Additionally, if we want to answer the question: "What deployments are calling my service?", we need to query Prometheus like this:

response_total{
  deployment!="", 
  dst_deployment="linkerd-controller",
  dst_namespace="linkerd"
}

How should the problem be solved?

Introduce new labels:

  • workload
  • name
  • dst_workload
  • dst_name

Current example

response_total{
  deployment="linkerd-web", 
  dst_deployment="linkerd-controller", 
}

Proposed example

response_total{
  workload="deployment", 
  name="linkerd-web", 
  dst_workload="deployment", 
  dst_name="linkerd-controller", 
}

How would users interact with this feature?

This change would be transparent, provided existing labels were maintained. It would enable users to re-export and interact with Linkerd metrics in a Prometheus-standard way. For example, "What deployments are calling my service?" becomes:

response_total{
  workload="deployment", 
  dst_workload="deployment", 
  dst_name="linkerd-controller", 
  dst_namespace="linkerd"
}

Relates to #4101.

/cc @adleong @grampelberg

@olix0r
Copy link
Member

olix0r commented Apr 13, 2020

Are workload and name meaningful labels outside the context of Linkerd? Or are these just suggestions you've chosen?

@olix0r
Copy link
Member

olix0r commented Apr 13, 2020

Furthermore, are we sure this is the only set of labels that we do not apply consistently across all uses of a metric? It seems like this issue should be more broadly scoped to audit our metrics for this class of problem?

@grampelberg
Copy link
Contributor

Ignoring the names, the problem from my perspective is that we're putting data we'd like to query into the names of labels (eg dst_deployment) and everything we'd like to query should be a value instead of a label name.

@siggy
Copy link
Member Author

siggy commented Apr 13, 2020

@olix0r name aligns with the name field in k8s, similar to namespace. workload aligns with how the k8s docs group things like Deployment/DaemonSet/ReplicaSet:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

Auditing all metrics is a worthy goal. In practice, there's a subset of labels that are commonly used by linkerd stat, linkerd dashboard, Grafana, and other related projects I'm working on. In cases where metrics are exported out of linkerd-prometheus, we drop (sum over) any labels we do not care about. The labels called out in this ticket are the set we do care about. Do a grep -R "response_total{" * for more context.

@adleong
Copy link
Member

adleong commented Apr 13, 2020

I don't think a audit of all metrics will be very helpful in this regard. We include the pod labels as Prometheus labels (#3833) so all bets are off in terms of having a uniform set of labels. I don't think it's feasible, given the way that we use Prometheus.

That said, I think adding workload_type, workload_name, dst_workload_type, and dst_workload_name metrics would be perfectly reasonable if it makes querying the metrics easier.

@siggy
Copy link
Member Author

siggy commented Apr 13, 2020

I'd prefer name to workload_name, to better align with namespace.
I'd also prefer workload to workload_type, as workload_type feels redundant. If there's a compelling reason not to use workload, I'd prefer workload_kind to better align with k8s' kind field.

@grampelberg
Copy link
Contributor

Just brainstorming and piling on @siggy's great points about making it more k8s centric, what about:

  • resource_name
  • resource_kind

We could continue adding things like:

  • resource_version
  • resource_author

@adleong
Copy link
Member

adleong commented Apr 14, 2020

Upon further inspection, this may be more difficult than anticipated.

These labels are generated by the prometheus kubernetes relabel config from meta labels like __meta_kubernetes_pod_label_linkerd_io_proxy_deployment=foo. As far as I can tell, there isn't a way in the relabel config to specify that the value of a label. So I don't see how we could create labels like resource_kind=deployment.

There may be some other way to approach this that doesn't utilize prometheus relabel configs, but that seems like a much bigger undertaking and from where I'm sitting, it doesn't seem like it's worth it. Let me know if you disagree of if I've overlooked a simpler way of doing this.

@siggy
Copy link
Member Author

siggy commented Apr 15, 2020

I think this should do it:

# for non-ReplicaSets (DaemonSet, StatefulSet)
# __meta_kubernetes_pod_controller_kind=DaemonSet
# __meta_kubernetes_pod_controller_name=foo
# =>
# workload_kind=DaemonSet
# workload_name=foo
- source_labels: [__meta_kubernetes_pod_controller_kind]
  action: replace
  target_label: workload_kind
- source_labels: [__meta_kubernetes_pod_controller_name]
  action: replace
  target_label: workload_name

# for ReplicaSets
# __meta_kubernetes_pod_controller_kind=ReplicaSet
# __meta_kubernetes_pod_controller_name=foo-bar-123
# =>
# workload_kind=Deployment
# workload_name=foo-bar
# deplyment=foo
- source_labels: [__meta_kubernetes_pod_controller_kind]
  action: replace
  regex: ^ReplicaSet$
  target_label: workload_kind
  replacement: Deployment
- source_labels:
  - __meta_kubernetes_pod_controller_kind
  - __meta_kubernetes_pod_controller_name
  action: replace
  regex: ^ReplicaSet;(.*)-[^-]+$
  target_label: workload_name

Screen Shot 2020-04-14 at 7 10 36 PM

@adleong
Copy link
Member

adleong commented Apr 15, 2020

Good call, @siggy, I think you're right.

See also linkerd/rfc#15 for an alternative approach.

@siggy
Copy link
Member Author

siggy commented Apr 15, 2020

@adleong My reading of linkerd/rfc#15 seems like it will solve for #4101, but not necessarily for this issue? Unless linkerd/rfc#15 includes proxies self-identifying their own workload_name and workload_kind?

@adleong
Copy link
Member

adleong commented Apr 15, 2020

@siggy they certainly could, but is it necessary? my understanding of #4101 is that if all metrics (inbound and outbound) have both a src and dst metadata, then the workload_name and workload_kind are not necessary. This is because on the inbound side, dst is self and on the outbound side, src is self.

@siggy
Copy link
Member Author

siggy commented Apr 15, 2020

@adleong Ok, just to confirm my understanding of linkerd/rfc#15, here is what I think/hope it is proposing:

Every request/response metric, exported via the proxy (not at rest in Prometheus) has the following label keys:

  • direction
  • src_workload_kind
  • src_workload_name
  • dst_workload_kind
  • dst_workload_name

This implies three changes to the proxy /metrics endpoint:

  • adding src_* labels to inbound request/response metrics
  • adding dst_* labels to inbound request/response metrics
  • adding src_* labels to outbound request/response metrics

@adleong
Copy link
Member

adleong commented Apr 15, 2020

Maybe we should move this discussion over to that RFC, but yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants