Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use weaver for semantic convention codegen #2098

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

lquerel
Copy link

@lquerel lquerel commented Sep 10, 2024

Changes

This PR marks the first step in the migration to Weaver for semconv code generation. The following changes have been made:

  • Utilized Weaver’s new capabilities to simplify code generation as much as possible.
  • Added a blank line between each declaration for improved readability.
  • Attribute notes have been included in comments.
  • String examples are now represented using double quotes.
  • Boolean examples are represented with true or false, instead of 'True' or 'False'.
  • Number examples are represented without single quotes.
  • Experimental attributes are now gated behind the semconv_experimental feature.
  • Deprecated attributes include a note explaining the reason for deprecation.
  • Sorted metric attributes by name.

Note: This PR represents the first step of the plan described here.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@@ -20,54 +20,24 @@ git fetch origin "v$SPEC_VERSION"
git reset --hard FETCH_HEAD
cd "$CRATE_DIR"

docker run --rm \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to call Weaver once, as the weaver.yaml file specifies what needs to be generated.

@lquerel lquerel changed the title [WIP] Use weaver for semantic convention codegen Use weaver for semantic convention codegen Sep 11, 2024
@lquerel lquerel marked this pull request as ready for review September 11, 2024 21:26
@lquerel lquerel requested a review from a team September 11, 2024 21:26
Copy link
Contributor

@TommyCpp TommyCpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding this PR only changes the generation tool we use. The end results are still a list of static string right?

@lquerel
Copy link
Author

lquerel commented Sep 11, 2024

Per my understanding this PR only changes the generation tool we use. The end results are still a list of static string right?

@TommyCpp @cijothomas

Yes, the initial intent was to minimize differences with the existing code generation first and then propose a more advanced integration that removes static strings entirely. However, if the community prefers to move directly to a step where static strings are replaced, such as for attribute declarations like:

pub const HTTP_REQUEST_METHOD: crate::attributes::AttributeKey<HttpRequestMethod> = crate::attributes::AttributeKey::new("http.request.method");

I am happy to update this PR accordingly. This would be straightforward since I already have a proof of concept (POC) for the step 2 described in #2100. Even if we don’t move directly to step 2, which is a fully type-safe API for Rust semconv, we could create an intermediate step (let’s call it step 1.5) where static strings are no longer visible to users or the compiler. I’d appreciate more feedback on this approach.

opentelemetry-semantic-conventions/CHANGELOG.md Outdated Show resolved Hide resolved
@@ -1,60 +1,79 @@
// DO NOT EDIT, this is an auto-generated file
//
// If you want to update the file:
// - Edit the template at scripts/templates/semantic_attributes.rs.j2
// - Edit the template at scripts/templates/registry/rust/attributes.rs.j2
Copy link
Member

@lalitb lalitb Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attributes.rs file currently defines semconv constants, which are then re-exported in various other modules (ie trace.rs, metrics.rs, resource.rs). If there aren't any (or too many) shared definitions used across multiple modules, would it make sense to move these constants to their respective modules where they actually belong? The attributes.rs can grow over time as we add more definitions, and add the semconv for logs later, could be difficult to manage even though auto-generated.

I know this has been the existing behavior, but then right time to decide any such change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add - this is not a blocker for the PR, something to discuss and plan for subsequent if all agree.

@lalitb
Copy link
Member

lalitb commented Sep 12, 2024

pub const HTTP_REQUEST_METHOD: crate::attributes::AttributeKey<HttpRequestMethod> = crate::attributes::AttributeKey::new("http.request.method");

@lquerel This looks awesome - in this case it will ensure that HTTP_REQUEST_METHOD will always have valid HTTP verbs. Just thinking in terms of the performance - we will possibly encapsulate these verbs in enum, and then encapsulate the '&static str for http.request.method in a structure AttributeKey. Do you see any performance overhead for (specifically) using them in hot path (as comparison to the existing approach) ?

@lquerel
Copy link
Author

lquerel commented Sep 12, 2024

pub const HTTP_REQUEST_METHOD: crate::attributes::AttributeKey = crate::attributes::AttributeKey::new("http.request.method");

@lquerel This looks awesome - in this case it will ensure that HTTP_REQUEST_METHOD will always have valid HTTP verbs. Just thinking in terms of the performance - we will possibly encapsulate these verbs in enum, and then encapsulate the '&static str for http.request.method in a structure AttributeKey. Do you see any performance overhead for (specifically) using them in hot path (as comparison to the existing approach) ?

@lalitb Yes, the verbs are indeed represented as an enum in the POC, as shown in the full declaration below. Regarding performance, HTTP_REQUEST_METHOD is a constant, so the overhead is minimal and primarily localized in the AttributeKey::value method, which can potentially be inlined. I also believe the current POC can be slightly optimized to avoid cloning the &str. While I don’t anticipate a noticeable overhead, we should confirm this with a proper benchmark. This approach is intermediary; the ultimate goal is to provide a fully type-safe API, where all hashmaps are replaced by structs for both mandatory and optional attributes. Additionally, metrics, spans, etc., will be predeclared with all their metadata such as instrument, description, unit, etc. (more efficient, and easier for end-users), and the SDK will be optimized to leverage this code generation. Once we reach that stage, I expect a significant reduction in overhead (but it's not this PR to be clear).

/// Attributes for the `client` namespace.
/// Attributes for the `error` namespace.
/// Attributes for the `exception` namespace.
/// Attributes for the `http` namespace.
/// Attributes for the `network` namespace.
/// Attributes for the `server` namespace.
/// Attributes for the `system` namespace.
/// Attributes for the `url` namespace.

/// A typed attribute key.
pub struct AttributeKey<T> {
    key: Key,
    phantom: std::marker::PhantomData<T>
}

impl <T> AttributeKey<T> {
    /// Returns a new [`AttributeKey`] with the given key.
    #[must_use]
    pub(crate) const fn new(key: &'static str) -> AttributeKey<T> {
        Self {
            key: Key::from_static_str(key),
            phantom: std::marker::PhantomData
        }
    }

    /// Returns the key of the attribute.
    #[must_use]
    pub fn key(&self) -> &Key {
        &self.key
    }
}

impl AttributeKey<StringValue> {
    /// Returns a [`KeyValue`] pair for the given value.
    #[must_use]
    pub fn value(&self, v: StringValue) -> KeyValue {
        KeyValue::new(self.key.clone(), v)
    }
}

impl AttributeKey<i64> {
    /// Returns a [`KeyValue`] pair for the given value.
    #[must_use]
    pub fn value(&self, v: i64) -> KeyValue {
        KeyValue::new(self.key.clone(), v)
    }
}

impl AttributeKey<f64> {
    /// Returns a [`KeyValue`] pair for the given value.
    #[must_use]
    pub fn value(&self, v: f64) -> KeyValue {
        KeyValue::new(self.key.clone(), v)
    }
}

impl AttributeKey<bool> {
    /// Returns a [`KeyValue`] pair for the given value.
    #[must_use]
    pub fn value(&self, v: bool) -> KeyValue {
        KeyValue::new(self.key.clone(), v)
    }
}

/// HTTP request method.
/// 
/// HTTP request method value SHOULD be "known" to the instrumentation.
/// By default, this convention defines "known" methods as the ones listed in [RFC9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-methods)
/// and the PATCH method defined in [RFC5789](https://www.rfc-editor.org/rfc/rfc5789.html).
/// 
/// If the HTTP request method is not known to instrumentation, it MUST set the `http.request.method` attribute to `_OTHER`.
/// 
/// If the HTTP instrumentation could end up converting valid HTTP request methods to `_OTHER`, then it MUST provide a way to override
/// the list of known HTTP methods. If this override is done via environment variable, then the environment variable MUST be named
/// OTEL_INSTRUMENTATION_HTTP_KNOWN_METHODS and support a comma-separated list of case-sensitive known HTTP methods
/// (this list MUST be a full override of the default known method, it is not a list of known methods in addition to the defaults).
/// 
/// HTTP method names are case-sensitive and `http.request.method` attribute value MUST match a known HTTP method name exactly.
/// Instrumentations for specific web frameworks that consider HTTP methods to be case insensitive, SHOULD populate a canonical equivalent.
/// Tracing instrumentations that do so, MUST also set `http.request.method_original` to the original value
///
/// ## Examples:
/// - GET
/// - POST
/// - HEAD
pub const HTTP_REQUEST_METHOD: crate::attributes::AttributeKey<HttpRequestMethod> = crate::attributes::AttributeKey::new("http.request.method");

/// HTTP request method
#[derive(Debug, Clone)]
#[non_exhaustive]
pub enum HttpRequestMethod {
    /// CONNECT method.
/// 
/// none
    Connect,
    /// DELETE method.
    Delete,
    /// GET method.
    Get,
    /// HEAD method.
    Head,
    /// OPTIONS method.
    Options,
    /// PATCH method.
    Patch,
    /// POST method.
    Post,
    /// PUT method.
    Put,
    /// TRACE method.
    Trace,
    /// Any HTTP method that the instrumentation has no prior knowledge of.
    Other,
    /// This variant allows defining a custom entry in the enum.
    _Custom(String),
}

impl HttpRequestMethod {
    /// Returns the string representation of the [`HttpRequestMethod`].
    #[must_use]
    pub fn as_str(&self) -> &str {
        match self {
            HttpRequestMethod::Connect => "CONNECT",
            HttpRequestMethod::Delete => "DELETE",
            HttpRequestMethod::Get => "GET",
            HttpRequestMethod::Head => "HEAD",
            HttpRequestMethod::Options => "OPTIONS",
            HttpRequestMethod::Patch => "PATCH",
            HttpRequestMethod::Post => "POST",
            HttpRequestMethod::Put => "PUT",
            HttpRequestMethod::Trace => "TRACE",
            HttpRequestMethod::Other => "_OTHER",
            HttpRequestMethod::_Custom(v) => v.as_str(),
            // Without this default case, the match expression would not
            // contain any variants if all variants are annotated with the
            // 'semconv_experimental' feature and the feature is not enabled.
            #[allow(unreachable_patterns)]
            _ => unreachable!(),
        }
    }
}

impl core::fmt::Display for HttpRequestMethod {
    /// Formats the value using the given formatter.
    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
        write!(f, "{}", self.as_str())
    }
}

impl crate::attributes::AttributeKey<HttpRequestMethod> {
    /// Returns a [`KeyValue`] pair for the given value.
    #[must_use]
    pub fn value(&self, v: &HttpRequestMethod) -> opentelemetry::KeyValue {
        opentelemetry::KeyValue::new(self.key.clone(), v.to_string())
    }
}

@lalitb
Copy link
Member

lalitb commented Sep 12, 2024

This approach is intermediary; the ultimate goal is to provide a fully type-safe API, where all hashmaps are replaced by structs for both mandatory and optional attributes

Thanks @lquerel for the details. We have seen the perf issues with the value part of KeyValue in the past, so the concern. I would vote for keeping the existing PR close to the current approach of exposing the &static str, and then go for the type-safe API, unless there are different thoughts by other @open-telemetry/rust-maintainers / @open-telemetry/rust-approvers .

@@ -6,7 +6,7 @@ CRATE_DIR="${SCRIPT_DIR}/../"

# freeze the spec version and generator version to make generation reproducible
SPEC_VERSION=1.27.0
SEMCOVGEN_VERSION=0.25.0
WEAVER_VERSION=v0.9.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the process for generating the code remains unchanged, correct? Specifically, we still increment the spec/weaver version here and run this script as usual?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct.

@lalitb
Copy link
Member

lalitb commented Sep 13, 2024

@lquerel If all are aligned with the direction -could you address the (nit) review comments (if any) and fix the CI, so that it can proceed for further review/approval?

@lquerel lquerel requested a review from a team as a code owner September 18, 2024 18:44
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.3%. Comparing base (3976f3d) to head (6f9a59c).
Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2098   +/-   ##
=====================================
  Coverage   78.3%   78.3%           
=====================================
  Files        121     121           
  Lines      20767   20767           
=====================================
+ Hits       16268   16269    +1     
+ Misses      4499    4498    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lquerel
Copy link
Author

lquerel commented Sep 18, 2024

@lalitb I’ve fixed most of the issues, but there are still a few problems related to unescaped markdown links in the semconv registry. cargo doc rejects markdown text like blabla [0,n] blabla. Instead of finding a workaround in the Rust template, I plan to add a new configuration for the comment filter to automatically escape invalid links. See: Issue #374. This should be part of the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants