-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add otel tracing implementation support #2578
base: master
Are you sure you want to change the base?
Conversation
fbf33d5
to
7b78485
Compare
8942a45
to
d3c7c92
Compare
f88e554
to
d95eef4
Compare
5f873fa
to
c839ab5
Compare
6d84d8c
to
dd99682
Compare
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]> Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
Signed-off-by: Lucas Thiesen <[email protected]>
dd99682
to
a25b184
Compare
👍 |
// Tracer is the opentracing tracer to use in the client | ||
Tracer opentracing.Tracer | ||
// Tracer is the open telemetry tracer to use in the client | ||
Tracer trace.Tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes I see Tracer
and others OtelTracer
, is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in some places I used OtelTracer
to differentiate from Tracer
and OtTracer
, the former is only OT if we need to be backward compatible and the later is kept in places we need access to both tracers and I wanted to make it clear that it is one or the other. Marjority of cases it should be just tracer
or Tracer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when we discontinue OpenTracing, which will probably take a while, we can have all of them as just Tracer/tracer.
Regarding how I tested these changes locally, the strategy I used was unity tests + stdouttrace, which is a tracer that implements I didn't test it directly with specific collectors or vendors like lightstep, I was planning to run skipper on a custom cluster and test it with lightstep collectors there before merging it. stdouttrace implements all behaviors supported by Otel SDK (OTLP) what is not well tested is the config part, the arguments we would pass in skipper parameters that would configure the communication between skipper tracer and a tracer collector or tracing commercial solution. cc: @MustafaSaber |
@@ -83,7 +83,7 @@ func TestFifoChanges(t *testing.T) { | |||
t.Helper() | |||
|
|||
client := net.NewClient(net.Options{ | |||
ResponseHeaderTimeout: 2 * time.Second, | |||
ResponseHeaderTimeout: 3 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was flaky and was failing most of the time in my local environment, increasing the timeout solved the problem.
}), | ||
) | ||
// tee filters override if we have a tracer | ||
if len(o.OpenTracing) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the override only apply for opentracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option has a very unfortunate name, but it contains both OpenTracing and OpenTelemetry tracers. I could not change the name to something with more meaning than Options.OpenTracing
to not break backward compatibility with library users, which is not great but we don't have much choice. :/
} | ||
globalTags[tag] = tagVal | ||
} | ||
case "otlp-exporter-endpoint": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From testing in the cluster: maybe an URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll try to test with a local OTLP collector to make sure no parameter is missing. But one of the things we need to change is this parameter should not be split by port and address only because we need the URL path to route it to the proper OTLP endpoint in most collectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this big effort. I did a first review pass and left my comments.
}, | ||
DNSDone: func(httptrace.DNSDoneInfo) { | ||
span.LogKV("DNS", "end") | ||
span.AddEvent("DNS", trace.WithAttributes(attribute.String("DNS", "stop"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why end changed to stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a mistake. Good catch!
originalRequest.Body = body | ||
cc.request = cr | ||
return cc, nil | ||
} | ||
|
||
func (c *context) Loopback() { | ||
loopSpan := c.Tracer().StartSpan(c.proxy.tracing.initialOperationName, opentracing.ChildOf(c.ParentSpan().Context())) | ||
defer loopSpan.Finish() | ||
_, loopSpan := c.Tracer().Start(c.request.Context(), c.proxy.tracing.initialOperationName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What return value is ignored here and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an stdlib context. This context doesn't seem to be used in the function.
One thing that could be done is update the request context so it carries the span information forward if any other function or package wants to use it.
But in theory this was not done in the past either, Ot.ContextWithSpan(...)
was never called here so the context was never updated, the only difference is that Otel.Tracer.Start(...)
always returns a context, which was not true for OT implementation.
// OtTracer holds the open telemetry tracer enabled for this proxy instance | ||
// DEPRECATED, use TracerWrapper. | ||
Tracer ot.Tracer | ||
|
||
// OtelTracer holds the opentracing tracer enabled for this proxy instance | ||
OtelTracer trace.Tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere we keep old and new tracer fields.
Why not use only OTEL tracer and provide users an adapter for opentracing tracer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do this from skipper.Run() point of view @lucastt , so we likely can delete proxy.Tracer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is possible, I just considered this as something that people using skipper as a library could be using so I wanted to keep backward compatibility. Under the hood the old tracer is converted into a tracerWrapper instance that convert calls from OTEL to OT.
So yeah, if this part does not need to be backward compatible it would be much better to have a single tracer instance.
type otelTransport struct { | ||
tr *http.Transport | ||
tracer trace.Tracer | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Somehow we created proxy span without it before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting question and I think it's how OTel is different from OT. Please confirm @lucastt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe this is not necessary. Because the goal of this struct is to inject the span into the request, but we already do this in net/httpclient.go:406
so the only case this struct would be necessary is if we make requests using this Roundtripper directly, without using skipper http client.
@@ -1173,14 +1211,14 @@ func (p *Proxy) do(ctx *context, parentSpan ot.Span) (err error) { | |||
ctx.ensureDefaultResponse() | |||
} else if ctx.route.BackendType == eskip.LoopBackend { | |||
loopCTX := ctx.clone() | |||
loopSpan := tracing.CreateSpan("loopback", ctx.request.Context(), p.tracing.tracer) | |||
_, loopSpan := p.tracing.Start(ctx.request.Context(), "loopback") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What return value is ignored here and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1310,7 +1354,7 @@ func (p *Proxy) serveResponse(ctx *context) { | |||
p.tracing.setTag(ctx.proxySpan, ClientRequestStateTag, ClientRequestCanceled) | |||
} | |||
|
|||
p.tracing.setTag(ctx.initialSpan, HTTPStatusCodeTag, uint16(ctx.response.StatusCode)) | |||
p.tracing.setTag(ctx.initialSpan, HTTPStatusCodeTag, fmt.Sprint(uint16(ctx.response.StatusCode))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used attributes elsewhere but here it is "tag", why is it so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of proxy it implements a tracing
package, this package was refactored to use Otel.Tracer
but I didn't change the names of the functions on this package, in fact I changed the minimum necessary to make it work with Otel, but if you think is more valuable here to change the nomenclature I can do that.
} else { | ||
span.SetTag(key, value) | ||
} | ||
span.SetAttributes(attribute.String(key, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensureUTF8 was lost with this change.
// OtelTracer is an implementation of opentelemetry.OtelTracer for testing. It records | ||
// the defined spans during a series of operations. | ||
type OtelTracer struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should take a chance and get rid of tracing and tracing/tracingtest packages and just use otel directly (and https://pkg.go.dev/go.opentelemetry.io/otel/oteltest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case why do we implement own test tracer instead of using https://pkg.go.dev/go.opentelemetry.io/otel/oteltest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is a mistake, I didn't know oteltest package at the time. Oteltest seems to be a better solution for tracing tests for now on. But maybe we could address this in a different PR? It seems to me this will bring a lot of new changes to an already very big PR. WDYT?
The idea of this PR is to add a new tracing implementation to skipper run options that would support OpenTelemetry (Otel), instead of the deprecated OpenTracing (Ot).
The initial plan is just to support Otel tracer creation and wrap Ot tracer into a custom wrapper that would implement Otel tracer and span interface but would call Ot under the hood.
With next PRs we would start to have Otel tracers as first class citizens in our structs and interfaces enabling a skipper user to use full Otel support in all functionalities.
On changing the tracer interface many modules were affected including hot path changes.