Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry manifest and Schema diff #400

Open
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Oct 3, 2024

Note: The scope of this PR has been reduced to focus only focus on the schema diff feature. Github issues have been created to track the features that have been postponed #482, #483.

This PR implements the command registry diff, see the following example:

cargo run -- registry diff -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.27.0.zip[model] --baseline-registry https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.26.0.zip[model] --diff-format markdown

In this example, the diff is displayed in markdown format. The following formats are supported: json, markdown, ansi, ansi_stats. YAML format will be supported once PR #525 is finalized.

A detailed description of the schema diff data model and the diffing process is visible here.

Notes:

  • The crate weaver_otel_schema is not essential for this PR; it was initially included as part of the preparations for the registry schema-update command. We have decided to implement this command in a future PR. However, for simplicity, I prefer to keep the preparation code in place instead of removing it. Same thing for all_changes in weaver_version.

List of modifications to apply to the semantic conventions repository after the release of the Weaver containing the current PR:

  • Add a registry-manifest.yaml file with the version of the next release.
  • Update all deprecated fields.

Closes: #186

The following command comparing the versions 1.29 and 1.30

/weaver registry diff -r 'https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.30.0.zip[model]' --baseline-registry 'https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.29.0.zip[model]' --diff-format markdown

produces the following markdown output:

Registry Attributes

New registry_attributes:

  • Add aws.extended_request_id
  • Add azure.client.id
  • Add azure.cosmosdb.connection.mode
  • Add azure.cosmosdb.consistency.level
  • Add azure.cosmosdb.operation.contacted_regions
  • Add azure.cosmosdb.operation.request_charge
  • Add azure.cosmosdb.request.body.size
  • Add azure.cosmosdb.response.sub_status_code
  • Add cassandra.consistency.level
  • Add cassandra.coordinator.dc
  • Add cassandra.coordinator.id
  • Add cassandra.page.size
  • Add cassandra.query.idempotent
  • Add cassandra.speculative_execution.count
  • Add cicd.pipeline.result
  • Add cicd.pipeline.run.state
  • Add cicd.system.component
  • Add cicd.worker.state
  • Add code.column.number
  • Add code.file.path
  • Add code.function.name
  • Add code.line.number
  • Add db.system.name
  • Add elasticsearch.node.name
  • Add gen_ai.request.seed
  • Add k8s.namespace.phase
  • Add network.connection.state
  • Add security_rule.category
  • Add security_rule.description
  • Add security_rule.license
  • Add security_rule.name
  • Add security_rule.reference
  • Add security_rule.ruleset.name
  • Add security_rule.uuid
  • Add security_rule.version
  • Add vcs.repository.name

Deprecated registry_attributes:

  • code.column (Note: Deprecated, use code.column.number)
  • code.function (Note: Deprecated, use code.function.name instead)
  • code.lineno (Note: Deprecated, use code.line.number instead)
  • db.cassandra.consistency_level (Note: Deprecated, use cassandra.consistency.level instead.)
  • db.cassandra.coordinator.dc (Note: Deprecated, use cassandra.coordinator.dc instead.)
  • db.cassandra.coordinator.id (Note: Deprecated, use cassandra.coordinator.id instead.)
  • db.cassandra.idempotence (Note: Deprecated, use cassandra.query.idempotent instead.)
  • db.cassandra.page_size (Note: Deprecated, use cassandra.page.size instead.)
  • db.cassandra.speculative_execution_count (Note: Deprecated, use cassandra.speculative_execution.count instead.)
  • db.cosmosdb.client_id (Note: Deprecated, use azure.client.id instead.)
  • db.cosmosdb.connection_mode (Note: Deprecated, use azure.cosmosdb.connection.mode instead.)
  • db.cosmosdb.consistency_level (Note: Deprecated, use cosmosdb.consistency.level instead.)
  • db.cosmosdb.regions_contacted (Note: Deprecated, use azure.cosmosdb.operation.contacted_regions instead.)
  • db.cosmosdb.request_charge (Note: Deprecated, use azure.cosmosdb.operation.request_charge instead.)
  • db.cosmosdb.request_content_length (Note: Deprecated, use azure.cosmosdb.request.body.size instead.)
  • db.cosmosdb.sub_status_code (Note: Deprecated, use azure.cosmosdb.response.sub_status_code instead.)
  • db.elasticsearch.node.name (Note: Deprecated, use elasticsearch.node.name instead.)
  • db.elasticsearch.path_parts (Note: Deprecated, use db.operation.parameter instead.)
  • db.system (Note: Deprecated, use db.system.name instead.)
  • event.name (Note: Identifies the class / type of event.)
  • exception.escaped (Note: Indicates that the exception is escaping the scope of the span.)
  • gen_ai.openai.request.seed (Note: Deprecated, use gen_ai.request.seed.)
  • system.network.state (Note: Deprecated, use network.connection.state instead.)

Metrics

New metrics:

  • Add metric.azure.cosmosdb.client.active_instance.count
  • Add metric.azure.cosmosdb.client.operation.request_charge
  • Add metric.cicd.pipeline.run.active
  • Add metric.cicd.pipeline.run.duration
  • Add metric.cicd.pipeline.run.errors
  • Add metric.cicd.system.errors
  • Add metric.cicd.worker.count
  • Add metric.k8s.cronjob.active_jobs
  • Add metric.k8s.daemonset.current_scheduled_nodes
  • Add metric.k8s.daemonset.desired_scheduled_nodes
  • Add metric.k8s.daemonset.misscheduled_nodes
  • Add metric.k8s.daemonset.ready_nodes
  • Add metric.k8s.deployment.available_pods
  • Add metric.k8s.deployment.desired_pods
  • Add metric.k8s.hpa.current_pods
  • Add metric.k8s.hpa.desired_pods
  • Add metric.k8s.hpa.max_pods
  • Add metric.k8s.hpa.min_pods
  • Add metric.k8s.job.active_pods
  • Add metric.k8s.job.desired_successful_pods
  • Add metric.k8s.job.failed_pods
  • Add metric.k8s.job.max_parallel_pods
  • Add metric.k8s.job.successful_pods
  • Add metric.k8s.namespace.phase
  • Add metric.k8s.replicaset.available_pods
  • Add metric.k8s.replicaset.desired_pods
  • Add metric.k8s.replication_controller.available_pods
  • Add metric.k8s.replication_controller.desired_pods
  • Add metric.k8s.statefulset.current_pods
  • Add metric.k8s.statefulset.desired_pods
  • Add metric.k8s.statefulset.ready_pods
  • Add metric.k8s.statefulset.updated_pods
  • Add metric.vcs.change.time_to_merge

Deprecated metrics:

  • metric.db.client.cosmosdb.active_instance.count (Note: Deprecated)
  • metric.db.client.cosmosdb.operation.request_charge (Note: Deprecated)

Spans

New spans:

  • Add span.azure.cosmosdb.client

@lquerel lquerel self-assigned this Oct 3, 2024
@lquerel lquerel added the enhancement New feature or request label Oct 3, 2024
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
@lquerel lquerel changed the title [WIP] Registry manifest and OTEL schema update [WIP] Registry manifest and Schema diff Nov 27, 2024
# Conflicts:
#	.clippy.toml
#	Cargo.toml
#	crates/weaver_semconv_gen/src/lib.rs
#	src/registry/search.rs
#	src/registry/stats.rs
#	src/registry/update_markdown.rs
Comment on lines 124 to 147
```yaml
# Version n+1
groups:
- id: registry.network.deprecated
type: attribute_group
attributes:
- id: net.peer.name
type: string
brief: Deprecated, use `server.address` on client spans and `client.address` on server spans.
deprecated:
type: conditionally_renamed
forward: >
switch span_kind {
case 'client' => attributes['server.address'] = attributes['net.peer.name'],
case 'server' => attributes['client.address'] = attributes['net.peer.name']
}
backward: >
switch span_kind {
case 'client' => attributes['net.peer.name'] = attributes['server.address'],
case 'server' => attributes['net.peer.name'] = attributes['client.address']
}
stability: experimental
examples: ['example.com']
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, the attribute has been deprecated in the attribute_group but the condition is related to its usage in a span. Should the forward / backward instructions be on the span definition where net.peer.name is referenced? There may be different instructions for different spans that use net.peer.name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's a good topic for discussion for our next Semantic Conventions Tooling SIG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the discussion in the last SIG meeting, it was decided to start with a more minimalist approach initially. As a result, I have updated this PR accordingly, both in the code and the main documentation (not this doc).

However, I am keeping this exploration document in place so that we can revisit this topic later if needed.

Comment on lines 433 to 451
```yaml
# Version n+1
groups:
- id: registry.db.deprecated
type: attribute_group
stability: experimental
attributes:
- id: db.instance.id
type: string
brief: 'Deprecated, no general replacement at this time. For Elasticsearch, use `db.elasticsearch.node.name` instead.'
deprecated:
type: conditionally_deprecated
forward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.elasticsearch.node.name'] = attributes['db.instance.id']
else drop attributes['db.instance.id']
backward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.instance.id'] = attributes['db.elasticsearch.node.name']
stability: experimental
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like my previous comment. When the instructions involve other attributes then the change is dependent on the presence of those attributes in that signal definition. Perhaps defining the instructions on the attribute_group means it's universal for all uses via references? Maybe this can be overridden with further instructions on the span, for example, to deviate from this default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the attribute_group, the setting is intended to be universal. However, I agree that we should demonstrate how this setting can be overridden when dealing with a specific signal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment.

src/registry/mod.rs Fixed Show fixed Hide fixed
},
/// A top-level telemetry object from the baseline registry was marked as deprecated in the head
/// registry.
Deprecated {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this its own change or should it be attached to other changes?

I.e. is it just an implication of change?

I think this was called out verbally, but it's the one I'm least sure of belonging with other "semantic" changes, especialyl given "uncategorized" as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded. Once updated, this PR LGTM.

Structurally/rust-wise you have all the pieces I'd look for. It's just naming/surface syntax at this point.

- `renamed`: A top-level telemetry object from the baseline registry was renamed in the head registry.
- `deprecated`: A top-level telemetry object from the baseline registry was marked as deprecated in the head registry.
- `updated`: One or more fields in a top-level telemetry object have been updated in the head registry.
- `removed`: A top-level telemetry object from the baseline registry was removed in the head registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For semconv specifically, we definitely don't want to allow this, instead we deprecate.

Also - my concern with "deprecated" is that when we rename, we're efecctively deprecating the old.

I'm reading this and think "deprecated" is too generic and too much of a catch-all. I'd rather use "uncategorized", where deprecation is a consequence of the change vs. the change itself.

I.e. we almost need a "removed" where we mark the type as deprecated and prevent further usage but don't remove our knowledge it once existed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecated type is indeed probably too much of a catch-all. However, I believe these three types are truly distinct, and I probably didn’t do a great job explaining them.

Currently, the general concept of deprecation is used for several types of changes in semantic conventions (renaming, “soft” removal, and other exotic changes). I propose refining my initial suggestion and the corresponding definitions as follows:

  • Rename the change type deprecated to obsoleted to clearly indicate that this change corresponds to an attribute or a signal that is discontinued without a valid replacement.
  • In my view, removed should exist at the Weaver level, if only to identify that there has been an actual deletion in a registry under validation. This type of change should never be issued for a published registry, but it is clearly a transitional change that can occur during the development of a registry. We could even build a policy leveraging this type of change in the future.
  • uncategorized is the catch-all change type representing all complex types of changes that we haven’t precisely codified. The idea of this type, as you mentioned during the meeting, is that we should gradually eliminate it from the registry.

Do we agree on this definition of things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like moving deprecated to obsoleted where deprecated can remain as a catch-all for "we changed this thing in a way" and obsoleted implies "do not use anymore, here for legacy reasons".

I agree we need to actually model removed in some way. obsoleted as soft-delete works for me.

So yes, I agree on this.

@lquerel lquerel requested a review from a team as a code owner January 31, 2025 02:07
Copy link
Contributor

@jerbly jerbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few doc nits. Plus a schema suggestion to avoid obscuring the original brief for the item.

docs/schema-changes-use-cases.md Outdated Show resolved Hide resolved
Comment on lines +33 to +36
/// complex reasons (split, merge, ...) which are currently not precisely define
/// in the supported deprecation reasons.
///
/// The `brief` field should contain the reason why the field has been obsoleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// complex reasons (split, merge, ...) which are currently not precisely define
/// in the supported deprecation reasons.
///
/// The `brief` field should contain the reason why the field has been obsoleted.
/// complex reasons (split, merge, ...) which are currently not precisely defined
/// in the supported deprecation reasons.
///
/// The `brief` field should contain the reason for this uncategorized deprecation.

},
{
"type": "object",
"description": "The telemetry object containing the deprecated field has been deprecated for complex reasons (split, merge, ...) which are currently not precisely define in the supported deprecation reasons.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "The telemetry object containing the deprecated field has been deprecated for complex reasons (split, merge, ...) which are currently not precisely define in the supported deprecation reasons.",
"description": "The telemetry object containing the deprecated field has been deprecated for complex reasons (split, merge, ...) which are currently not precisely defined in the supported deprecation reasons.",


Variant 1
```yaml
brief: <text explaining the reason of the renaming>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer for the deprecated object to have its own note to explain the reason for deprecation. Then the existing brief can remain intact? e.g.

brief: Here is the original brief for the attribute 
deprecated:
  explanation: <text explaining the reason of the renaming>
  reason: renamed
  renamed_to: <name of the telemetry object>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently replace original brief with something like "Deprecated, use 'another.attribute' instead, but nothing stops us from keeping the original brief or adding deprecating details. There is also attribute note and we can decide to keep brief around and add deprecation details into the note.

I'd prefer not to add new properties (since we already have breif and note), but it's not a strong opinion.

@@ -41,6 +41,48 @@
}
},
"$defs": {
"Deprecated": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also update https://github.com/open-telemetry/weaver/blob/main/schemas/semconv-syntax.md - an informal and reader-friendly version of it?


Variant 1
```yaml
brief: <text explaining the reason of the renaming>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently replace original brief with something like "Deprecated, use 'another.attribute' instead, but nothing stops us from keeping the original brief or adding deprecating details. There is also attribute note and we can decide to keep brief around and add deprecation details into the note.

I'd prefer not to add new properties (since we already have breif and note), but it's not a strong opinion.

registry_attributes:
- name: http.server_name # attribute name
type: obsoleted # change type
note: This attribute is deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the note populated from note or brief?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Release
Development

Successfully merging this pull request may close these issues.

Automate OTEL Schema Generation and Update Process with Migration Guide Support
4 participants