-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP resources for the opentelemetry-ebpf-profiler #628
Comments
@open-telemetry/profiling-maintainers FYI. |
On the problem of duplication: Maybe sharing just the string table would be practical enough? We could add another string table in the parent container, and require that string IDs are disjoint between the profile-local and the profile-parent-shared tables. This maybe insufficient though given how we intern labels - the key string is not interned there because we use OTel's key value type. It might be useful if OTel would have first class concept of memoized string tables and a key value type supporting that (including for the key). |
Regarding:
I would remove this part, since it's not really correct. Some examples:
In my opinion it's incompatible with more broad OTEL. Otel spec this part defines resource as https://opentelemetry.io/docs/specs/otel/overview/#resources:
IMO Profiler should also capture information about the entity in Resource. Breaking this creates many problems:
|
With #609 there was a proposed change to lift commonly used lookup tables to a higher protocol field. At that time open-telemetry/community#2492 was under discussion and the SIG asked to wait. Having multiple |
@povilasv Thank you for the example, I'll update the issue. I took a look at the receivers mentioned, my understanding is that:
Overall, depending on how the telemetry is collected, it looks like the definition of a resource is either on the process level (process scraper) or on the container level (understandable since for example, the kubelet doesn't expose metrics per process). |
I believe there are some things we might want to consider with regards to lifting the attribute_table and string_table at the ProfilesData level:
|
Context
The intent of this issue is to discuss potential incompatibilities between the current OpenTelemetry protocol specification for profiles and the opentelemetry-ebpf-profiler, with regards to usage of OTLP resources.
The opentelemetry-ebpf-profiler is a profiler which allows CPU profiling for all processes running on a single host.
The profiler will be built in a standalone collector distribution, where it would be configured as a collector receiver (RFC). This distribution would be deployed on each host for which the user wants to collect profiling data.
Problem Statement
Currently, by default, OTLP profiles generated by the opentelemetry-ebpf-profiler always contain a single Profile object (in a single ScopeProfile, in a single ResourceProfile), which contains stack traces for all the processes running on the host (when off CPU profiling is enabled, this adds another ScopeProfile object, under the same ResourceProfile).
As a consequence:
This doesn’t align with other signals (metrics, traces, logs), for which both host and container attributes are attached to the top-level resource attributes. As mentioned in open-telemetry/opentelemetry-collector-contrib#37269, this makes the opentelemetry-ebpf-profiler’s approach incompatible with the
k8sattributesprocessor
(the processor automatically enriches the data with kubernetes metadata and expects kubernetes attributes to be added in the top-level resource attributes).However, in the current state of the profiling protocol specification, having kubernetes attributes as top-level resource attributes would require splitting the profile into multiple resource profiles (instead of a single profile per payload), which leads to the following problems:
Defining resources from profiled processes
The opentelemetry-ebpf-profiler profiles all processes running on a host (whether they are in pods/containers, or not). It is unclear what the exact definition of a resource should be in that case:
In the current state, if we don’t intend to modify the model, and keep compatibility with the
k8sattributesprocessor
we need at least one resource per container.To some extent, the problem mentioned intersects with some of the challenges mentioned in Resources and Entities.
One additional challenge is that a resource is currently defined as the entity producing telemetry - strictly speaking, the opentelemetry-ebpf-profiler is producing the profiles for all processes running on the host. In that case, the entities being observed (the different processes) are different from the entity producing the telemetry (the opentelemetry-ebpf-profiler).
Performance impact on lookup tables
Depending on the resource definition we land on, we need to be mindful of the impact on the different lookup tables. Currently, the Profile object contains lookup tables that are used to deduplicate information from stack traces: for example, this avoids having to store repeatedly the same function names, or sample/location attributes.
The goal of these lookup tables is to keep the size of a profile reasonable. While this should marginally impact the size of the payload on the wire (due to compression), it does impact the memory footprint of the decompressed, and de-serialized payload (in the ebpf profiler, then in the collector).
Splitting processes into multiple ResourceProfiles will mean that they will no longer share lookup tables. The granularity at which we split will influence the overhead (for example, splitting by process id will lead to drastically increased overhead compared to splitting per container, due to runtimes that fork often such as Python).
We could consider moving the lookup tables at the ProfilesData level, however this would make merging multiple ProfilesData payloads (for e.g. for batching) harder, since it would require merging their lookup tables (which is possible, but could require further changes to the spec to do in an efficient manner).
Recap
Currently, by default, the profiler generates OTLP payloads containing a single Profile object with stack traces for all processes on a host. This creates issues with resource attributes, particularly container attributes, which are attached to Sample attributes, not top-level resource attributes. This incompatibility makes the profiler's approach inconsistent with other signals and processors like the
k8sattributesprocessor
.This leads to the following challenges:
The goal of this issue is to start a discussion on these challenges with impacted SIGs and determine next steps.
The text was updated successfully, but these errors were encountered: