-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: Reduce logging #1013
agent: Reduce logging #1013
Conversation
4759b58
to
1bfb43e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! A few comments, but broadly pretty good
pkg/agent/dispatcher.go
Outdated
@@ -165,16 +166,29 @@ func NewDispatcher( | |||
var firstSequentialFailure *time.Time | |||
continuedFailureAbortTimeout := time.Second * time.Duration(runner.global.config.Monitor.MaxHealthCheckSequentialFailuresSeconds) | |||
|
|||
// if we don't have any errors, we will log only every 10th successful health check | |||
const maxSuccessiveSkips = 10 | |||
var successesSkipped int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about not resetting this when we log, and instead just counting the number of sequential successes/failures and including the opposite in the logs?
(i.e., on first success, log the number of failures; and vice versa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the idea, do think just counting totals would work?
I.e. to maintain totalSuccessCnt
and totalFailCnt
and always add them to logFields
?
Then print successful healthchecks only if totalSuccessCnt % 10 == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roughly, yeah- with the additions that
totalSuccessCnt
is reset to 0 on failuretotalFailCnt
is reset to 0 on success- Failure-then-success logs
totalFailCnt
- Success-then-failure logs
totalSuccessCnt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, made the change.
Failure-then-success logs totalFailCnt
Success-then-failure logs totalSuccessCnt
IMO it's not necessary to do this. I pushed a commit now which just counts oks/fails in a row.
pkg/agent/runner.go
Outdated
if reqData.LastPermit != nil && *reqData.LastPermit == reqData.Resources { | ||
// If the last permit is the same as the current request, we can skip request logging. | ||
logger.Debug("Sending request to scheduler", zap.Any("request", reqData)) | ||
} else { | ||
logger.Info("Sending request to scheduler", zap.Any("request", reqData)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep this as debug-only and put the conditional in exec_plugin.go
, so that it takes place while we hold the lock (just nicer to prioritize log lines that are guaranteed to be in the same order as the operations that took place 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that exec_plugin.go
doesn't know exact reqBody
sent to the scheduler, so request log in runner.go
has more context.
Also I'm not sure about the conditional, what is the best balance between logs cost / usefulness here in your opinion?
WDYT about such log levels and no conditions?
- [info] Starting plugin request (exec_plugin.go)
- [debug] Sending request to scheduler (runner.go)
- [debug] Received response from scheduler (runner.go)
- [info] Plugin request successful (exec_plugin.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about such log levels and no conditions?
That makes sense, yeah - I think we could also change (1) to debug in the cases where it doesn't change, but it's easy enough to change that in a follow-up
My idea was that exec_plugin.go doesn't know exact
reqBody
sent to the scheduler, so request log inrunner.go
has more context.
Yeah, that's what I was thinking first as well, but eventually thought it'd be better to keep the guarantees around log lines being in the right order -- especially because IIRC reqBody
can be exactly determined from the action given to exec_plugin.go
1bfb43e
to
3903101
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice!
Try to reduce some very common logs:
ref https://github.com/neondatabase/cloud/issues/15591
ref https://github.com/neondatabase/cloud/issues/15605