-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756
base: main
Are you sure you want to change the base?
Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the overall direction. This makes a lot of sense
docs/resource/k8s.md
Outdated
Choose the first value found: | ||
|
||
- `pod.annotation[resource.opentelemetry.io/service.instance.id]` | ||
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we use a delimiter that is less common in names? Maybe /
or :
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
is already typically used for namespacing in otel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really namespacing, though. Is it? I guess at least most k8s names seem to use dashes... e.g. kube-system
or kube-addon-manager
. So maybe .
isn't too bad.
@@ -354,3 +354,70 @@ A CronJob creates Jobs on a repeating schedule. | |||
<!-- endsemconv --> | |||
|
|||
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status | |||
|
|||
## Specify resource attributes using Kubernetes annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be an OTEP (or non-normative guide) instead.
In the current format it looks more like guidance not a spec.
Also I wonder if that could be part of the service.*
definition. What we are trying to do here is to define how service.*
attributes should be populated in a specific environment not how K8s attributes are defined/affected.
@open-telemetry/specs-semconv-approvers thoughts?
We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the |
docs/resource/k8s.md
Outdated
Choose the first value found: | ||
|
||
- `pod.annotation[resource.opentelemetry.io/service.name]` | ||
- `pod.label[app.kubernetes.io/name]` (well-known label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also look for the label app.kubernetes.io/instance
. The value of this label is supposed to be a unique name to help differentiate multiple instances of the same application. In a helm chart, this will be same as the Release.Name. So if a user is installing an application twice for different usecases in the same namespace, this is the label which will differentiate the 2 services. And I think we should reference this before we check the more generic label app.kubernetes.io/name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service.name
isn't meant to differentiate multiple instances of the same application, right? The example from https://opentelemetry.io/docs/specs/semconv/attributes-registry/service/#service-attributes is "shoppingcart"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the usage of instance of application
, let me use an example. I have to run 2 different statfulsets of the mysql
db with greater than 1 pod in each in the same namespace. These are meant to be 2 distinct installations of the db, each maintaining its unique dataset.* Following the guidance of k8s std labels, i'll have below values
Statefulset 1 - app.kubernetes.io/name=mysql
, app.kubernetes.io/instance: mysql-abc
Statefulset 2 - app.kubernetes.io/name=mysql
, app.kubernetes.io/instance: mysql-xyz
My reading of the attributes definition - Logical name of the service.
, makes me think the unique mysql-abc
and mysql-xyz
names are more appropriate since i can differentiate which installation of the db the telemetry is coming from.
- edited for more clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@open-telemetry/specs-semconv-approvers can someone help clarify the intention behind service.name
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#applications-and-instances-of-applications - I would say that app.kubernetes.io/instance
is the right translation for service.name
- I didn't see that before.
Definition of service.name - see https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/service.md
service.name: MUST be the same for all instances of horizontally scaled services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering is we should fall back to app.kubernetes.io/name
as the next best thing if app.kubernetes.io/instance
is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering is we should fall back to app.kubernetes.io/name as the next best thing if app.kubernetes.io/instance is not found.
yeah, that sounds good to me
docs/resource/k8s.md
Outdated
- `pod.annotation[resource.opentelemetry.io/service.version]` | ||
- `pod.label[app.kubernetes.io/version]` (well-known label | ||
[app.kubernetes.io/version](https://kubernetes.io/docs/reference/labels-annotations-taints/#app-kubernetes-io-version)) | ||
- `if (contains(container docker image tag, '/') == false) container docker image tag` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we bail if the tag has /
? The service attribute conventions don't define a format for version. If none of the other options work, wouldn't using the tag be better (not sure what the sdk defaults to)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about this part of the spec.
I took this from the way the operator logic - collector does it a bit differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found the reason - it was a guard for a port number separator as can be seen here: https://github.com/distribution/reference/blob/284a39eaf3368476e0a4c6114a0eec61220acdd9/reference_test.go#L250
I've updated the spec to refer to that library - which is already used by the collector for container.image.id
-
... with the notable difference that I think we should include the digest if available, it's the best source of truth for the service version
docs/resource/k8s.md
Outdated
Choose the first value found: | ||
|
||
- `pod.annotation[resource.opentelemetry.io/service.instance.id]` | ||
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed service.instance.id
in the semconv meeting and had some doubts about the scenario where a container is restarted without a pod recreate (e.g. kubelet restarting a container for OOM). The confusion was whether this id should change to reflect that a new instance of the container was started which is what would happen i think if we left this to an sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention for service.instance.id
states that it MUST be unique for each instance of the same service.namespace,service.name pair (in other words service.namespace,service.name,service.instance.id triplet MUST be globally unique).
For a pod which is owned by a statefulset this is not the case. I guess the question is whether this id should be used to differentiate between different instances of a service at a given time or across all instances ever run. In case of a statefulset, the id is not unique enough to differentiate between instances of a pod with same sts ordinal but k8s will only run one instance of a pod with a given ordinal (the id will remain same after an sts pod is recreated). If we want a different id across pod recreates, switching to pod uid instead of name would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time. IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
Are sdks going to ignore any service.instance.id
values passed in with the env var OTEL_RESOURCE_ATTRIBUTES
? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.
Are sdks going to ignore any
service.instance.id
values passed in with the env varOTEL_RESOURCE_ATTRIBUTES
? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.
no, SDKs must honor whatever attributes are set using OTEL_RESOURCE_ATTRIBUTES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time.
that is also my understanding
@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page? |
…d prod (deployment.envirnoment.name)
maybe as subsection(s) under https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/README.md#service |
@@ -22,6 +22,11 @@ This document defines standard attributes for resources. These attributes are ty | |||
- [Semantic Attributes with Dedicated Environment Variable](#semantic-attributes-with-dedicated-environment-variable) | |||
- [Semantic Attributes with SDK-provided Default Value](#semantic-attributes-with-sdk-provided-default-value) | |||
- [Service](#service) | |||
- [Service in Kubernetes](#service-in-kubernetes) | |||
- [How `service.namespace` is calculated](#how-servicenamespace-is-calculated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is currently applicable to auto-instrumentation with operator, I think we should clarify that these are the guidelines suggested by conventions and adopted in operator. So maybe change the heading to
- [How `service.namespace` is calculated](#how-servicenamespace-is-calculated) | |
- [How `service.namespace` can be calculated](#how-servicenamespace-can-be-calculated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
maybe "should" is better?
Fixes #236
This is a new take on the issue. #349 has been created before and not completed.
So what's the difference now?
service.instance.id
has been completed - which was effectively a blocker before.Merge requirement checklist
[chore]