-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): abort on panic #4026
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -278,8 +278,10 @@ mod testutils; | |
|
||
use std::sync::Arc; | ||
|
||
use futures::StreamExt; | ||
use relay_config::Config; | ||
use relay_system::{Controller, Service}; | ||
use tokio::select; | ||
|
||
use crate::service::ServiceState; | ||
use crate::services::server::HttpServer; | ||
|
@@ -301,9 +303,30 @@ pub fn run(config: Config) -> anyhow::Result<()> { | |
// information on all services. | ||
main_runtime.block_on(async { | ||
Controller::start(config.shutdown_timeout()); | ||
let service = ServiceState::start(config.clone())?; | ||
let (service, mut join_handles) = ServiceState::start(config.clone())?; | ||
HttpServer::new(config, service.clone())?.start(); | ||
Controller::shutdown_handle().finished().await; | ||
|
||
loop { | ||
select! { | ||
Some(res) = join_handles.next() => { | ||
match res { | ||
Ok(()) => { | ||
relay_log::trace!("Service exited normally."); | ||
} | ||
Err(e) => { | ||
if e.is_panic() { | ||
std::panic::resume_unwind(e.into_panic()); | ||
} | ||
} | ||
} | ||
} | ||
_ = Controller::shutdown_handle().finished() => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: when every service implements a shutdown listener, awaiting on |
||
break | ||
} | ||
else => break | ||
} | ||
} | ||
|
||
anyhow::Ok(()) | ||
})?; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ use relay_config::Config; | |
use relay_system::{Addr, AsyncResponse, Controller, FromMessage, Interface, Sender, Service}; | ||
use std::future::Future; | ||
use tokio::sync::watch; | ||
use tokio::task::JoinHandle; | ||
use tokio::time::{timeout, Instant}; | ||
|
||
use crate::services::metrics::RouterHandle; | ||
|
@@ -189,13 +190,13 @@ impl HealthCheckService { | |
impl Service for HealthCheckService { | ||
type Interface = HealthCheck; | ||
|
||
fn spawn_handler(mut self, mut rx: relay_system::Receiver<Self::Interface>) { | ||
fn spawn_handler(mut self, mut rx: relay_system::Receiver<Self::Interface>) -> JoinHandle<()> { | ||
let (update_tx, update_rx) = watch::channel(StatusUpdate::new(Status::Unhealthy)); | ||
let check_interval = self.config.health_refresh_interval(); | ||
// Add 10% buffer to the internal timeouts to avoid race conditions. | ||
let status_timeout = (check_interval + self.config.health_probe_timeout()).mul_f64(1.1); | ||
|
||
tokio::spawn(async move { | ||
let j1 = tokio::spawn(async move { | ||
let shutdown = Controller::shutdown_handle(); | ||
|
||
while shutdown.get().is_none() { | ||
|
@@ -212,7 +213,7 @@ impl Service for HealthCheckService { | |
update_tx.send(StatusUpdate::new(Status::Unhealthy)).ok(); | ||
}); | ||
|
||
tokio::spawn(async move { | ||
let _j2 = tokio::spawn(async move { | ||
while let Some(HealthCheck(message, sender)) = rx.recv().await { | ||
let update = update_rx.borrow(); | ||
|
||
|
@@ -225,6 +226,8 @@ impl Service for HealthCheckService { | |
}); | ||
} | ||
}); | ||
|
||
j1 // TODO: should return j1 + j2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a few places where the spawn handler spawns more than one task. In a follow-up, we should transform these to something like tokio::spawn(async {
let subtask = tokio::spawn(async {...});
/// ...
subtask.await;
}); |
||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we maybe want in a future iteration to define a respawn behavior of services? It might be tricky to make sure existing channels are re-setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually re-triggers the panic and makes the process terminate. Respawning services is another option I would like to discuss on Monday, but it has its drawbacks (what if the service keeps panicking on every re-spawn?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is something I thought of, I feel like for that we should have some global retry counters or heuristics to know when it's not possible to restart a service anymore.