Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

zeitlinger
Copy link
Member

@zeitlinger zeitlinger commented Jan 17, 2025

Fixes #236

This is a new take on the issue. #349 has been created before and not completed.

So what's the difference now?

  1. The proposed rules are already implemented in the OTel operator and work well: https://github.com/open-telemetry/opentelemetry-operator#configure-resource-attributes - the PR (at least in it's initial form) makes the operator implementation fully compatible with this spec PR.
  2. The specification work on service.instance.id has been completed - which was effectively a blocker before.

Merge requirement checklist

@zeitlinger zeitlinger requested review from a team as code owners January 17, 2025 12:53
Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the overall direction. This makes a lot of sense

docs/resource/k8s.md Outdated Show resolved Hide resolved
docs/resource/k8s.md Outdated Show resolved Hide resolved
Choose the first value found:

- `pod.annotation[resource.opentelemetry.io/service.instance.id]`
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we use a delimiter that is less common in names? Maybe / or :?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. is already typically used for namespacing in otel

Copy link
Contributor

@dashpole dashpole Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really namespacing, though. Is it? I guess at least most k8s names seem to use dashes... e.g. kube-system or kube-addon-manager. So maybe . isn't too bad.

@ChrsMark ChrsMark requested a review from a team January 20, 2025 10:59
@@ -354,3 +354,70 @@ A CronJob creates Jobs on a repeating schedule.
<!-- endsemconv -->

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status

## Specify resource attributes using Kubernetes annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be an OTEP (or non-normative guide) instead.
In the current format it looks more like guidance not a spec.

Also I wonder if that could be part of the service.* definition. What we are trying to do here is to define how service.* attributes should be populated in a specific environment not how K8s attributes are defined/affected.

@open-telemetry/specs-semconv-approvers thoughts?

@dmitryax
Copy link
Member

We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the k8s resource section of the semantic conventions repo to service because it prescribes a way of populating values of service attributes, not attributes in k8s namespace.

Choose the first value found:

- `pod.annotation[resource.opentelemetry.io/service.name]`
- `pod.label[app.kubernetes.io/name]` (well-known label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also look for the label app.kubernetes.io/instance. The value of this label is supposed to be a unique name to help differentiate multiple instances of the same application. In a helm chart, this will be same as the Release.Name. So if a user is installing an application twice for different usecases in the same namespace, this is the label which will differentiate the 2 services. And I think we should reference this before we check the more generic label app.kubernetes.io/name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service.name isn't meant to differentiate multiple instances of the same application, right? The example from https://opentelemetry.io/docs/specs/semconv/attributes-registry/service/#service-attributes is "shoppingcart"

Copy link
Contributor

@jinja2 jinja2 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the usage of instance of application, let me use an example. I have to run 2 different statfulsets of the mysql db with greater than 1 pod in each in the same namespace. These are meant to be 2 distinct installations of the db, each maintaining its unique dataset.* Following the guidance of k8s std labels, i'll have below values

Statefulset 1 - app.kubernetes.io/name=mysql, app.kubernetes.io/instance: mysql-abc
Statefulset 2 - app.kubernetes.io/name=mysql, app.kubernetes.io/instance: mysql-xyz

My reading of the attributes definition - Logical name of the service., makes me think the unique mysql-abc and mysql-xyz names are more appropriate since i can differentiate which installation of the db the telemetry is coming from.

  • edited for more clarity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@open-telemetry/specs-semconv-approvers can someone help clarify the intention behind service.name here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#applications-and-instances-of-applications - I would say that app.kubernetes.io/instance is the right translation for service.name - I didn't see that before.

Definition of service.name - see https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/service.md

service.name: MUST be the same for all instances of horizontally scaled services

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering is we should fall back to app.kubernetes.io/name as the next best thing if app.kubernetes.io/instance is not found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering is we should fall back to app.kubernetes.io/name as the next best thing if app.kubernetes.io/instance is not found.

yeah, that sounds good to me

- `pod.annotation[resource.opentelemetry.io/service.version]`
- `pod.label[app.kubernetes.io/version]` (well-known label
[app.kubernetes.io/version](https://kubernetes.io/docs/reference/labels-annotations-taints/#app-kubernetes-io-version))
- `if (contains(container docker image tag, '/') == false) container docker image tag`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we bail if the tag has /? The service attribute conventions don't define a format for version. If none of the other options work, wouldn't using the tag be better (not sure what the sdk defaults to)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this part of the spec.

I took this from the way the operator logic - collector does it a bit differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found the reason - it was a guard for a port number separator as can be seen here: https://github.com/distribution/reference/blob/284a39eaf3368476e0a4c6114a0eec61220acdd9/reference_test.go#L250

I've updated the spec to refer to that library - which is already used by the collector for container.image.id -

... with the notable difference that I think we should include the digest if available, it's the best source of truth for the service version

Choose the first value found:

- `pod.annotation[resource.opentelemetry.io/service.instance.id]`
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed service.instance.id in the semconv meeting and had some doubts about the scenario where a container is restarted without a pod recreate (e.g. kubelet restarting a container for OOM). The confusion was whether this id should change to reflect that a new instance of the container was started which is what would happen i think if we left this to an sdk.

Copy link
Contributor

@jinja2 jinja2 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for service.instance.id states that it MUST be unique for each instance of the same service.namespace,service.name pair (in other words service.namespace,service.name,service.instance.id triplet MUST be globally unique). For a pod which is owned by a statefulset this is not the case. I guess the question is whether this id should be used to differentiate between different instances of a service at a given time or across all instances ever run. In case of a statefulset, the id is not unique enough to differentiate between instances of a pod with same sts ordinal but k8s will only run one instance of a pod with a given ordinal (the id will remain same after an sts pod is recreated). If we want a different id across pod recreates, switching to pod uid instead of name would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time. IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Copy link
Contributor

@jinja2 jinja2 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Are sdks going to ignore any service.instance.id values passed in with the env var OTEL_RESOURCE_ATTRIBUTES? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Are sdks going to ignore any service.instance.id values passed in with the env var OTEL_RESOURCE_ATTRIBUTES? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.

no, SDKs must honor whatever attributes are set using OTEL_RESOURCE_ATTRIBUTES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time.

that is also my understanding

@zeitlinger
Copy link
Member Author

We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the k8s resource section of the semantic conventions repo to service because it prescribes a way of populating values of service attributes, not attributes in k8s namespace.

@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page?

@github-actions github-actions bot added the enhancement New feature or request label Jan 28, 2025
@trask
Copy link
Member

trask commented Jan 28, 2025

@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page?

maybe as subsection(s) under https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/README.md#service

@@ -22,6 +22,11 @@ This document defines standard attributes for resources. These attributes are ty
- [Semantic Attributes with Dedicated Environment Variable](#semantic-attributes-with-dedicated-environment-variable)
- [Semantic Attributes with SDK-provided Default Value](#semantic-attributes-with-sdk-provided-default-value)
- [Service](#service)
- [Service in Kubernetes](#service-in-kubernetes)
- [How `service.namespace` is calculated](#how-servicenamespace-is-calculated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is currently applicable to auto-instrumentation with operator, I think we should clarify that these are the guidelines suggested by conventions and adopted in operator. So maybe change the heading to

Suggested change
- [How `service.namespace` is calculated](#how-servicenamespace-is-calculated)
- [How `service.namespace` can be calculated](#how-servicenamespace-can-be-calculated)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

maybe "should" is better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:k8s enhancement New feature or request
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

Proposal: Define mapping from k8s well-known labels to semconv
8 participants