Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoscale-scheduler: Reduce log verbosity #1001

Closed
MMeent opened this issue Jul 4, 2024 · 3 comments
Closed

autoscale-scheduler: Reduce log verbosity #1001

MMeent opened this issue Jul 4, 2024 · 3 comments
Assignees
Labels
m/good_first_issue Moment: when doing your first Neon contributions

Comments

@MMeent
Copy link

MMeent commented Jul 4, 2024

Problem description / Motivation

About 95% of the autoscale-scheduler logs are generated in the plugin/run.go file, with a perfect 25/25/25/25 split across these locations:

While the logs by themselves may have interesting contents every now and then, I also notice seemingly insignificant logs like the following, where there are seemingly no changes requested to the state of the system:

{"level":"info","ts":1720120641.3131783,"logger":"autoscale-scheduler.agent-handler","caller":"plugin/run.go:304","msg":"Handled requested resources from pod","pod":{"namespace":"default","name":"compute-wild-sun-a2smlmvm-gkvkn"},"virtualmachine":{"namespace":"default","name":"compute-wild-sun-a2smlmvm"},"node":"i-0295ac5cc12b7839a.eu-central-1.compute.internal","verdict":{"cpu":"Register 0.25 -> 0.25 (pressure 0 -> 0); node reserved 7.812 -> 7.812 (of 127.61), node capacityPressure 0 -> 0 (0 -> 0 spoken for)","mem":"Register 1Gi -> 1Gi (pressure 0 -> 0); node reserved 26720Mi -> 26720Mi (of 519498520Ki), node capacityPressure 0 -> 0 (0 -> 0 spoken for)"}}

Feature idea(s) / DoD

Apply filtering in the application for which logs are interesting, or stop logging at all these locations. By eliminating one or two of these log sites the log volume can quite easily be reduced by a good margin.

Implementation ideas

  • move the log level for no-op changes (such as what tends to happen on line 304) to a debug level
  • move internal state update log messages to a debug level (e.g., metrics)
  • merge non-critical messages into a single "this all happened while processing this message" uber-logline.
  • if these messages are used to determine timing, only log messages which take (e.g.) >10ms to process.
@stradig stradig added the m/good_first_issue Moment: when doing your first Neon contributions label Jul 16, 2024
@stradig stradig assigned petuhovskiy and unassigned sharnoff Jul 16, 2024
@petuhovskiy
Copy link
Member

Em opened a PR to log fewer lines per request: #1015
Also, #1013 reduced default frequency of agent->plugin requests from 5s to 15s. This is probably should be enough for now and we can close this issue after release?

@petuhovskiy
Copy link
Member

I checked the log volume explorer in grafana and scheduler logs are currently less frequent than controller, agent, runner and proxy. Closing this issue, next iteration to reduce logs can use this issue -> https://github.com/neondatabase/cloud/issues/15605.

@MMeent
Copy link
Author

MMeent commented Jul 29, 2024

image
Yep, looks like these reductions are quite visible in our log billing metrics: averaged 80-ish daily -> 50-ish daily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
m/good_first_issue Moment: when doing your first Neon contributions
Projects
None yet
Development

No branches or pull requests

4 participants