agent: Reduce logging #1013

petuhovskiy · 2024-07-19T16:11:36Z

Try to reduce some very common logs:

Change interval for recurring scheduler requests (5s -> 15s)
Log 1-2 log entries instead of 4 for each scheduler request
Log successful healthchecks only once per 10 requests
Do not log "Making metrics request to VM", same info as in "Updated metrics"

ref https://github.com/neondatabase/cloud/issues/15591
ref https://github.com/neondatabase/cloud/issues/15605

sharnoff

nice! A few comments, but broadly pretty good

pkg/agent/core/state.go

sharnoff · 2024-07-19T17:42:09Z

pkg/agent/dispatcher.go

@@ -165,16 +166,29 @@ func NewDispatcher(
 		var firstSequentialFailure *time.Time
 		continuedFailureAbortTimeout := time.Second * time.Duration(runner.global.config.Monitor.MaxHealthCheckSequentialFailuresSeconds)

+		// if we don't have any errors, we will log only every 10th successful health check
+		const maxSuccessiveSkips = 10
+		var successesSkipped int


WDYT about not resetting this when we log, and instead just counting the number of sequential successes/failures and including the opposite in the logs?

(i.e., on first success, log the number of failures; and vice versa)

I didn't get the idea, do think just counting totals would work?

I.e. to maintain totalSuccessCnt and totalFailCnt and always add them to logFields?

Then print successful healthchecks only if totalSuccessCnt % 10 == 0?

Roughly, yeah- with the additions that

totalSuccessCnt is reset to 0 on failure

totalFailCnt is reset to 0 on success

Failure-then-success logs totalFailCnt

Success-then-failure logs totalSuccessCnt

Ok, made the change.

Failure-then-success logs totalFailCnt
Success-then-failure logs totalSuccessCnt

IMO it's not necessary to do this. I pushed a commit now which just counts oks/fails in a row.

pkg/agent/dispatcher.go

sharnoff · 2024-07-19T17:47:33Z

pkg/agent/runner.go

+	if reqData.LastPermit != nil && *reqData.LastPermit == reqData.Resources {
+		// If the last permit is the same as the current request, we can skip request logging.
+		logger.Debug("Sending request to scheduler", zap.Any("request", reqData))
+	} else {
+		logger.Info("Sending request to scheduler", zap.Any("request", reqData))
+	}


I'd rather keep this as debug-only and put the conditional in exec_plugin.go, so that it takes place while we hold the lock (just nicer to prioritize log lines that are guaranteed to be in the same order as the operations that took place 😅)

My idea was that exec_plugin.go doesn't know exact reqBody sent to the scheduler, so request log in runner.go has more context.

Also I'm not sure about the conditional, what is the best balance between logs cost / usefulness here in your opinion?

WDYT about such log levels and no conditions?

[info] Starting plugin request (exec_plugin.go)

[debug] Sending request to scheduler (runner.go)

[debug] Received response from scheduler (runner.go)

[info] Plugin request successful (exec_plugin.go)

WDYT about such log levels and no conditions?

That makes sense, yeah - I think we could also change (1) to debug in the cases where it doesn't change, but it's easy enough to change that in a follow-up

My idea was that exec_plugin.go doesn't know exact reqBody sent to the scheduler, so request log in runner.go has more context.

Yeah, that's what I was thinking first as well, but eventually thought it'd be better to keep the guarantees around log lines being in the right order -- especially because IIRC reqBody can be exactly determined from the action given to exec_plugin.go

pkg/agent/runner.go

sharnoff

lgtm, nice!

petuhovskiy added 4 commits July 19, 2024 16:29

Reduce frequency of recurring scheduler requests (5s -> 15s)

dae63a5

Reduce logs for scheduler requests

bb05638

Reduce healthcheck logs from dispatcher.go

fb3bd2f

Remove log "Making metrics request to VM"

3903101

petuhovskiy force-pushed the arthur/reduce-logs-agent branch from 4759b58 to 1bfb43e Compare July 19, 2024 17:29

petuhovskiy marked this pull request as ready for review July 19, 2024 17:33

petuhovskiy requested review from sharnoff and Omrigan July 19, 2024 17:34

sharnoff reviewed Jul 19, 2024

View reviewed changes

petuhovskiy mentioned this pull request Jul 19, 2024

agent: Reduce logs for calculated desired resources #1014

Draft

petuhovskiy force-pushed the arthur/reduce-logs-agent branch from 1bfb43e to 3903101 Compare July 19, 2024 17:55

petuhovskiy added 3 commits July 19, 2024 20:20

Fix review comments

da846b2

Remove conditional logs in scheduler reqs

8658d64

Add better healthcheck logs logic

a3cdd0b

sharnoff approved these changes Jul 19, 2024

View reviewed changes

petuhovskiy merged commit 5a5fd62 into main Jul 19, 2024
15 checks passed

petuhovskiy deleted the arthur/reduce-logs-agent branch July 19, 2024 21:52

petuhovskiy mentioned this pull request Jul 22, 2024

autoscale-scheduler: Reduce log verbosity #1001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: Reduce logging #1013

agent: Reduce logging #1013

petuhovskiy commented Jul 19, 2024 •

edited

Loading

sharnoff left a comment

sharnoff Jul 19, 2024

petuhovskiy Jul 19, 2024

sharnoff Jul 19, 2024

petuhovskiy Jul 19, 2024

sharnoff Jul 19, 2024

petuhovskiy Jul 19, 2024

sharnoff Jul 19, 2024

sharnoff left a comment

agent: Reduce logging #1013

agent: Reduce logging #1013

Conversation

petuhovskiy commented Jul 19, 2024 • edited Loading

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff Jul 19, 2024

Choose a reason for hiding this comment

petuhovskiy Jul 19, 2024

Choose a reason for hiding this comment

sharnoff Jul 19, 2024

Choose a reason for hiding this comment

petuhovskiy Jul 19, 2024

Choose a reason for hiding this comment

sharnoff Jul 19, 2024

Choose a reason for hiding this comment

petuhovskiy Jul 19, 2024

Choose a reason for hiding this comment

sharnoff Jul 19, 2024

Choose a reason for hiding this comment

sharnoff left a comment

Choose a reason for hiding this comment

petuhovskiy commented Jul 19, 2024 •

edited

Loading