-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: fix missed register_callsite
error
#2938
base: v0.1.x
Are you sure you want to change the base?
Conversation
f8db879
to
8b6dca6
Compare
There are 2 triggers which will cause a subscriber to receive a call to `Subscriber::register_callsite` for a specific callsite. 1. The first time the event or span at that callsite is executed. 2. When a new subscriber is added or removed (for example, calls to `set_default` or `with_default`) It is trigger (2) that will cause a new subscriber to receive `Subscriber::register_callsite` for all the callsites which had already been registered before it became active. When a callsite is registered for trigger (1), the callsite starts in state `UNREGISTERED`. The first thread to encounter the callsite will transition it to `REGISTERING` and determine the overall interest for the callsite by registering with all known dispatchers (which will call into `Subscriber::register_callsite`). Once that is complete, the callsite is added to the list of all known callsites and its state is transitioned to `REGISTERED`. is (re)built for all known dispatchers. The callsite starts in state `UNREGISTERED`. The This calls down into `Subscriber::register_callsite` for each subscriber. Once that is complete, the callsite is added to the global list of known callsites. While the callsite interest is being rebuilt, other threads that encounter the callsite will be given `Interest::sometimes()` until the registration is complete. However, if a new subscriber is added during this window, all the interest for all callsites will be rebuilt, but because the new callsite (in state `REGISTERING`) won't be included because it isn't yet in the global list of callsites. This can cause a case where that new subscriber being added won't receive `Subscriber::register_callsite` before it receives the subsequent call to `Subscriber::event` or `Subscriber::new_span`. The documentation on [Registering Callsites] is not very explicit on this point, but it does suggest that `Subscriber::register_callsite` will be called before the call to either `Subscriber::event` or `Subscriber::new_span`, and the current behavior can break this implicit contract. [Registering Callsites]: https://docs.rs/tracing-core/0.1.32/tracing_core/callsite/index.html#registering-callsites This change swaps the order of rebuilding the callsite interest and adding the callsite to the global list so that the callsite gets pushed first, avoiding this window in which a subscriber won't get a call to `register_callsite`. As such, a callsite may have its interest read before it is set. In this case, the existing implementation will return `Interest::sometimes()` for the `DefaultCallsite` implementation. Other implementations (outside of the `tracing` project) may perform this differently, but in this case, there is no documented guarantee regarding the ordering. A regression test is included which provokes the race condition 100% of the time before the changes in this fix. Fixes: #2743
8b6dca6
to
51d7d9e
Compare
@hawkw I'd love to get your opinion on the change I've made in the The test in |
Thanks for the ping! I'll try to give it a look today or tomorrow. |
@hds you mentioned the docs not being explicit on the topic, could you maybe clarify them? You just might be the second most knowledgeable person on the topic now. |
Note: This is a change against the
v0.1.x
branch.Motivation
There are 2 triggers which will cause a subscriber to receive a call to
Subscriber::register_callsite
for a specific callsite.set_default
orwith_default
)It is trigger (2) that will cause a new subscriber to receive
Subscriber::register_callsite
for all the callsites which had alreadybeen registered before it became active.
When a callsite is registered for trigger (1), the callsite starts in
state
UNREGISTERED
.The first thread to encounter the callsite will transition it to
REGISTERING
and determine the overall interest for the callsite byregistering with all known dispatchers (which will call into
Subscriber::register_callsite
).Once that is complete, the callsite is added to the list of all known
callsites and its state is transitioned to
REGISTERED
.is (re)built for all known dispatchers. The callsite starts in state
UNREGISTERED
. The This calls down intoSubscriber::register_callsite
for each subscriber. Once that iscomplete, the callsite is added to the global list of known callsites.
While the callsite interest is being rebuilt, other threads that
encounter the callsite will be given
Interest::sometimes()
until theregistration is complete. However, if a new subscriber is added during
this window, all the interest for all callsites will be rebuilt, but
because the new callsite (in state
REGISTERING
) won't be includedbecause it isn't yet in the global list of callsites.
This can cause a case where that new subscriber being added won't
receive
Subscriber::register_callsite
before it receives the subsequentcall to
Subscriber::event
orSubscriber::new_span
.The documentation on Registering Callsites is not very explicit on
this point, but it does suggest that
Subscriber::register_callsite
will be called before the call to either
Subscriber::event
orSubscriber::new_span
, and the current behavior can break this implicitcontract.
Solution
This change swaps the order of rebuilding the callsite interest and
adding the callsite to the global list so that the callsite gets pushed
first, avoiding this window in which a subscriber won't get a call to
register_callsite
.As such, a callsite may have its interest read before it is set. In this
case, the existing implementation will return
Interest::sometimes()
for the
DefaultCallsite
implementation. Other implementations (outsideof the
tracing
project) may perform this differently, but in thiscase, there is no documented guarantee regarding the ordering.
A regression test is included which provokes the race condition 100% of
the time before the changes in this fix.
Fixes: #2743