-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce EventsExecutor implementation #1391
base: rolling
Are you sure you want to change the base?
Conversation
Draft form of this feature to drive discussion only. I've done some minimal work on this relative to the original code to get things in roughly the right shape; however the functional test currently seems to explode, and a bunch of the linters are unhappy. Even once this nominally works, there's a fair bit more work to be done on it, starting with the removal of the boost dependency. |
Signed-off-by: Brad Martin <[email protected]>
The previous version worked on 22.04+Iron, but fails on 24.04+Jazzy or Rolling. It seems like the behavior of std::bind() may have changed when asked to bind a py::object to a callback taking a py::handle. Signed-off-by: Brad Martin <[email protected]>
039907a
to
18036c4
Compare
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
6a7c1e7
to
68ecfb4
Compare
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
clang-format fixes things that uncrustify tends to leave alone, but then uncrustify seems slightly unhappy with that result. Also reflow comments at 100 columns. This uses the .clang-format file from the ament-clang-format package, with the exception of SortIncludes being set to false, because I didn't like what it tried to do with includes without that override. Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Tests that currently work are enabled, and those that don't are annotated what needs to be done before they can be enabled Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
std::string("Failed to set timer reset callback: ") + rcl_get_error_string().str); | ||
} | ||
timers_[timer] = ready_callback; | ||
if (timers_.size() == WARN_TIMERS_COUNT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (timers_.size() == WARN_TIMERS_COUNT) { | |
if (timers_.size() >= WARN_TIMERS_COUNT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this change, you'd get the warning re-printed with each additional timer beyond 8, rather than just once, which I was trying to avoid. Granted, nothing stops you from creating seven timers, and then repeatedly creating and destroying the 8th, and getting the warning repeatedly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider more warning a better behavior. The rational being, that nodes tend to be spammy with their output, and you could easily miss the warning. Also the management should get worse with every additional added timer, therefore I would say its worth printing an additional warning.
if (next_call_ns <= 0) { | ||
// This just notifies RCL that we're considering the timer triggered, for the purposes of | ||
// updating the next trigger time. | ||
const auto ret = rcl_timer_call(timer_cb_pair.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens directly before the the execution of the timer in the c++ version. I wonder if doing it here introduces some unwanted behavior...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None that I've encountered yet? It just seemed important to get started on the next interval as quickly as possible to keep the timer rate from drifting due to the latency in dispatching the timer. Are there any specific issues you suspect with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember some test failing around this detail in my implementations, sadly I don't remember the details. I guess the behavior around cancel is broken now.
E.g.
A callback taking some time is running.
Timer becomes ready, is added to the ready queue
Some other callback cancels the timer
Timer is executed anyway
Is there a check if the timer was canceled directly before the execution, that I missed ?
The interval should not drift, if this call is issued later. The reason for this is, that the next trigger time is computed as :
while(nextTime < now)
{
nextTimer = nextTime + period;
}
I think there is another bug lurking here. If a timer takes longer than its period, your queue will fill up. You need to block the timer from recomputing the next execution time, until it was executed.
} | ||
it->second->RemoveTimer(timer); | ||
if (it->second->empty()) { | ||
clock_managers_.erase(it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, isn't there a race here, that the asio callback still happens into the now deleted clock_manager ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the timer's own callback is safe, because it calls a user callback directly without capturing the clock manager itself. However, it may be possible for HandleJump()
and UpdateTimers()
to get called with a deleted this
pointer, if you remove the last timer from a clock immediately after a callback was posted, and before the callback is dispatched. I'll make a note to look at this closer later.
std::string("Failed to set timer reset callback: ") + rcl_get_error_string().str); | ||
} | ||
timers_[timer] = ready_callback; | ||
if (timers_.size() == WARN_TIMERS_COUNT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider more warning a better behavior. The rational being, that nodes tend to be spammy with their output, and you could easily miss the warning. Also the management should get worse with every additional added timer, therefore I would say its worth printing an additional warning.
// objects you're working with. We'll try to figure out what kind of stuff is hiding behind the | ||
// abstraction by having the Waitable add itself to a wait set, then take stock of what all ended | ||
// up there. We'll also have to hope that no Waitable implementations ever change their component | ||
// entities over their lifetimes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, the rclcpp layer explicitly forces the waitables, to implement events handling. I don't know much about the rclpy waitable internals, but the approach taken here, would break the rclcpp waitables, so we should check this... A good candidate to have a look at should be the action interfaces.
if (next_call_ns <= 0) { | ||
// This just notifies RCL that we're considering the timer triggered, for the purposes of | ||
// updating the next trigger time. | ||
const auto ret = rcl_timer_call(timer_cb_pair.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember some test failing around this detail in my implementations, sadly I don't remember the details. I guess the behavior around cancel is broken now.
E.g.
A callback taking some time is running.
Timer becomes ready, is added to the ready queue
Some other callback cancels the timer
Timer is executed anyway
Is there a check if the timer was canceled directly before the execution, that I missed ?
The interval should not drift, if this call is issued later. The reason for this is, that the next trigger time is computed as :
while(nextTime < now)
{
nextTimer = nextTime + period;
}
I think there is another bug lurking here. If a timer takes longer than its period, your queue will fill up. You need to block the timer from recomputing the next execution time, until it was executed.
It was being used only inconsistently, and on reflection it wasn't adding any protection beyond what the GIL already provides. Signed-off-by: Brad Martin <[email protected]>
Needed to release the GIL while blocking on another lock. Signed-off-by: Brad Martin <[email protected]>
Closes #1389