Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP resources for the opentelemetry-ebpf-profiler #628

Open
Gandem opened this issue Feb 19, 2025 · 6 comments
Open

OTLP resources for the opentelemetry-ebpf-profiler #628

Gandem opened this issue Feb 19, 2025 · 6 comments

Comments

@Gandem
Copy link

Gandem commented Feb 19, 2025

Context

The intent of this issue is to discuss potential incompatibilities between the current OpenTelemetry protocol specification for profiles and the opentelemetry-ebpf-profiler, with regards to usage of OTLP resources.

The opentelemetry-ebpf-profiler is a profiler which allows CPU profiling for all processes running on a single host.

The profiler will be built in a standalone collector distribution, where it would be configured as a collector receiver (RFC). This distribution would be deployed on each host for which the user wants to collect profiling data.

Problem Statement

Currently, by default, OTLP profiles generated by the opentelemetry-ebpf-profiler always contain a single Profile object (in a single ScopeProfile, in a single ResourceProfile), which contains stack traces for all the processes running on the host (when off CPU profiling is enabled, this adds another ScopeProfile object, under the same ResourceProfile).

As a consequence:

  • The top level resource attributes only hold general information about the host on which the profiler is running.
  • Container attributes were intended to be attached to the Sample attributes (for e.g. kubernetes pod name, deployment name, container name and id, …)

This doesn’t align with other signals (metrics, traces, logs), for which both host and container attributes are attached to the top-level resource attributes. As mentioned in open-telemetry/opentelemetry-collector-contrib#37269, this makes the opentelemetry-ebpf-profiler’s approach incompatible with the k8sattributesprocessor (the processor automatically enriches the data with kubernetes metadata and expects kubernetes attributes to be added in the top-level resource attributes).

However, in the current state of the profiling protocol specification, having kubernetes attributes as top-level resource attributes would require splitting the profile into multiple resource profiles (instead of a single profile per payload), which leads to the following problems:

Defining resources from profiled processes

The opentelemetry-ebpf-profiler profiles all processes running on a host (whether they are in pods/containers, or not). It is unclear what the exact definition of a resource should be in that case:

  • Is every single process a separate resource? This might lead to an excessive number of resources for profiled runtimes which fork a lot (e.g. Python).
  • Is every single container a resource? In which case, what do we do for non-containerized processes on the host: should we group them together in a single resource?
  • Is there any other definition we should consider in the context of the opentelemetry-ebpf-profiler?

In the current state, if we don’t intend to modify the model, and keep compatibility with the k8sattributesprocessor we need at least one resource per container.

To some extent, the problem mentioned intersects with some of the challenges mentioned in Resources and Entities.

One additional challenge is that a resource is currently defined as the entity producing telemetry - strictly speaking, the opentelemetry-ebpf-profiler is producing the profiles for all processes running on the host. In that case, the entities being observed (the different processes) are different from the entity producing the telemetry (the opentelemetry-ebpf-profiler).

Performance impact on lookup tables

Depending on the resource definition we land on, we need to be mindful of the impact on the different lookup tables. Currently, the Profile object contains lookup tables that are used to deduplicate information from stack traces: for example, this avoids having to store repeatedly the same function names, or sample/location attributes.

The goal of these lookup tables is to keep the size of a profile reasonable. While this should marginally impact the size of the payload on the wire (due to compression), it does impact the memory footprint of the decompressed, and de-serialized payload (in the ebpf profiler, then in the collector).

Splitting processes into multiple ResourceProfiles will mean that they will no longer share lookup tables. The granularity at which we split will influence the overhead (for example, splitting by process id will lead to drastically increased overhead compared to splitting per container, due to runtimes that fork often such as Python).

We could consider moving the lookup tables at the ProfilesData level, however this would make merging multiple ProfilesData payloads (for e.g. for batching) harder, since it would require merging their lookup tables (which is possible, but could require further changes to the spec to do in an efficient manner).

Recap

Currently, by default, the profiler generates OTLP payloads containing a single Profile object with stack traces for all processes on a host. This creates issues with resource attributes, particularly container attributes, which are attached to Sample attributes, not top-level resource attributes. This incompatibility makes the profiler's approach inconsistent with other signals and processors like the k8sattributesprocessor.

This leads to the following challenges:

  • Defining resources for the profiler, considering various granularities (per process, per container, or another concept).
  • Impact on lookup tables and payload size when splitting processes into multiple ResourceProfiles.

The goal of this issue is to start a discussion on these challenges with impacted SIGs and determine next steps.

@tigrannajaryan
Copy link
Member

@open-telemetry/profiling-maintainers FYI.

@aalexand
Copy link
Member

On the problem of duplication:

Maybe sharing just the string table would be practical enough? We could add another string table in the parent container, and require that string IDs are disjoint between the profile-local and the profile-parent-shared tables.

This maybe insufficient though given how we intern labels - the key string is not interned there because we use OTel's key value type. It might be useful if OTel would have first class concept of memoized string tables and a key value type supporting that (including for the key).

@povilasv
Copy link

Regarding:

This doesn’t align with other signals (metrics, traces, logs), where generally, a resource represents both the entity producing the telemetry data and the entity being observed (e.g. a process in a container in a pod generating metrics).

I would remove this part, since it's not really correct. Some examples:

  • hostmetrics process scraper - collects metrics for multiple resources( processes).
  • file log receiver in context of k8s, collects logs for multiple pods although runs on a node;
  • kubelet stats receiver - collects for multiple pods. although runs on a node.
  • k8scluster receiver - collects metrics for a lot of of different objects , although runs in a single deployment.
  • ...

As mentioned in open-telemetry/opentelemetry-collector-contrib#37269, this makes the opentelemetry-ebpf-profiler’s approach incompatible with the k8sattributesprocessor

In my opinion it's incompatible with more broad OTEL.

Otel spec this part defines resource as https://opentelemetry.io/docs/specs/otel/overview/#resources:

Resource captures information about the entity for which telemetry is recorded. For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name.

IMO Profiler should also capture information about the entity in Resource.

Breaking this creates many problems:

  • What do you do about OTTL?
  • K8S resource attributes / process resource attributes defined Semantic conventions?
  • Other Collector components that rely on resource model - Routing connector, loadbalancing exporter..

@florianl
Copy link
Contributor

With #609 there was a proposed change to lift commonly used lookup tables to a higher protocol field. At that time open-telemetry/community#2492 was under discussion and the SIG asked to wait.
With some more experience, in particular around reporting multiple and different ScopeProfiles, I think, attribute_table and string_table should be lifted even to ProfilesData.

Having multiple ResourceProfile per Sample will increase the payload significantly. One reason for the significant increase is, that ResourceProfiles don't have a concept of referencing values. By using Sample.attribute_indices the OTel profiling signal is efficient in reducing duplicate values.

@Gandem
Copy link
Author

Gandem commented Feb 20, 2025

I would remove this part, since it's not really correct. Some examples

@povilasv Thank you for the example, I'll update the issue. I took a look at the receivers mentioned, my understanding is that:

  • hostmetrics process scraper is submitting one resource per process, with the following resource attributes.
  • file log receiver when used with the container parser will include the container's resource attributes, with one ResourceLogs field per container (done in the converter)
  • kubelet stats receiver also creates one resource per container, with the following attributes.

Overall, depending on how the telemetry is collected, it looks like the definition of a resource is either on the process level (process scraper) or on the container level (understandable since for example, the kubelet doesn't expose metrics per process).

@Gandem
Copy link
Author

Gandem commented Feb 20, 2025

I believe there are some things we might want to consider with regards to lifting the attribute_table and string_table at the ProfilesData level:

  • Do we use these lookup tables for resource and scope attributes (or only use them for sample / location attributes)? If we use them for resource and scope attributes, we'll need to support this in all components that rely on these attributes when adding support for the profiling signal (for example OTTL, routing connector, ...). And this will not be aligned with other signals (unless we introduce an - optional? - OTel first class concept of memoized string tables for all signals).

  • Orthogonally, we'll need to support merging of top level lookup tables. For example, currently, the batch processor batches payloads by simply appending resource profiles from the multiple payloads. This would require additional logic to merge the lookup tables for ProfilesData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants