-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(outbound): TCP/TLS route metrics #3355
Conversation
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Because the opaq route mocks its selection logic, the first route returned by the API will always be used. This means that all subsequent routes are incoherent and cannot be selected. There's no value in preserving incoherent states in the proxy, so it's preferable to initially require that the control plane only returns one opaq route. This change removes some of the unused matching types and simplifies the opaq router to only handle one route.
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general sense is:
- We don't want the errno changes, or anything to do with errno in the short term.
- We should use a simple 'Error' enum with expected errors ('Forbidden', 'I/O', 'Invalid', 'Unexpected', '')--we can expand this with additional variants matching against IO error codes, but we can use a bounded set rather than one that takes a numeric code.
- If we have a constrained set of errors, we should eagerly register them rather than lazily.
- I.e. we have a TcpRouteMetricsFamily that is registered against route labels in a NewService, which produces a TcpRouteMetrics with various error counters.
- My sense is that we don't gain much from the current factoring/reusability of InstrumentConnection--we may be better off having all of the metrics and labels defined in the outbound crate. I'm not sure that the middleware portion of this is particularly resusable. Given that errors are defined in the outbound crate, it probably makes sense for all of the middleware to be bound in the outbound crate. If we do want a reusable middleware, it probabably need not be prom::Counter-oriented (i.e. it could hold a trait that calls a function on
Result<(), &Error>
).
This is all to try to avoid being generic in places that are not really advantageous.
impl<T, N, L> NewService<T> for NewInstrumentConnection<N, L> | ||
where | ||
N: Clone, | ||
L: Clone, | ||
{ | ||
type Service = InstrumentConnection<T, N, L>; | ||
|
||
fn new_service(&self, target: T) -> Self::Service { | ||
InstrumentConnection::new(target, self.inner.clone(), self.params.clone()) | ||
} | ||
} | ||
|
||
// === impl InstrumentConnection === | ||
|
||
impl<T, N, L: Clone> InstrumentConnection<T, N, L> { | ||
fn new(target: T, inner: N, params: TcpMetricsParams<L>) -> Self { | ||
Self { | ||
target, | ||
inner, | ||
params, | ||
} | ||
} | ||
} | ||
|
||
impl<T, I, N, S, L> Service<I> for InstrumentConnection<T, N, L> | ||
where | ||
L: Clone + Hash + Eq + EncodeLabelSetMut + Debug + Send + Sync + 'static, | ||
T: Clone + Send + Sync + 'static, | ||
T: Param<L>, | ||
I: io::AsyncRead + io::AsyncWrite + Debug + Send + Unpin + 'static, | ||
N: NewService<T, Service = S> + Clone + Send + 'static, | ||
S: Service<I> + Send, | ||
S::Error: Into<Error>, | ||
S::Future: Send, | ||
{ | ||
type Response = S::Response; | ||
type Error = Error; | ||
type Future = Pin<Box<dyn Future<Output = Result<S::Response, Error>> + Send + 'static>>; | ||
|
||
#[inline] | ||
fn poll_ready(&mut self, _: &mut Context<'_>) -> Poll<Result<(), Self::Error>> { | ||
Poll::Ready(Ok(())) | ||
} | ||
|
||
fn call(&mut self, io: I) -> Self::Future { | ||
const ERRNO_UNKNOWN: i32 = 0; | ||
let target = self.target.clone(); | ||
let new_accept = self.inner.clone(); | ||
let labels: L = target.param(); | ||
let params = self.params.clone(); | ||
|
||
Box::pin(async move { | ||
let metrics_open = params.metrics_open(labels.clone()); | ||
let svc = new_accept.new_service(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be structured such that we don't "break" the stacking--the NewService should eagerly do as much work as possible and this future should only handle registering completion metrics.
Signed-off-by: Zahari Dichev <[email protected]>
let metrics = self.metrics.clone(); | ||
|
||
Box::pin(async move { | ||
let svc = new_accept.new_service(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this happen in the NewService impl so that this middleware can properly proxy poll_ready()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 2549251
Signed-off-by: Zahari Dichev <[email protected]>
let inner = self.inner.clone(); | ||
|
||
Box::pin(async move { | ||
metrics.inc_open(); | ||
match inner.oneshot(io).await.map_err(Into::into) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let inner = self.inner.clone(); | |
Box::pin(async move { | |
metrics.inc_open(); | |
match inner.oneshot(io).await.map_err(Into::into) { | |
metrics.inc_open(); | |
let call = self.inner.call(io); | |
Box::pin(async move { | |
match call.await.map_err(Into::into) { |
The clone/oneshot invalidates the prior poll_ready -- poll_ready "reserves" us the ability to send a request. We want to use that reservation. When we clone, we invalidate that reservation. Oneshot combines waiting for a reservation and dispatching the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once we fix the service middleware. |
Signed-off-by: Zahari Dichev <[email protected]>
This PR adds transport level route metrics for TCP and TLS. The metrics added are:
Signed-off-by: Zahari Dichev [email protected]