Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decreasing log verbosity #328

Open
nh2 opened this issue Jul 31, 2017 · 5 comments
Open

Decreasing log verbosity #328

nh2 opened this issue Jul 31, 2017 · 5 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Jul 31, 2017

My stolon cluster is within the same data center and I'd like failovers to be detected and performed immediately, so I have set:

  • sleepInterval "1s"
  • requestTimeout "1s"
  • failInterval "1s"

Seems to be working great but now I get very frequent logs:

Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1288: already standby
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1410: postgres parameters not changed
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1288: already standby
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1410: postgres parameters not changed

Is it possible to decrease the log frequency of these "everything is OK" messages without increasing the latency for failovers?

For example, it'd be great if I can configure that it prints only once per minute that everything is still OK.

If not, I'd like to feature-request that.

@sgotti
Copy link
Member

sgotti commented Aug 8, 2017

@nh2

The logs are printed at every check so cannot be changed to be printed only at specific intervals without adding more code. They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

As a note:

so I have set:

sleepInterval "1s"
requestTimeout "1s"
failInterval "1s"

I won't suggest using these values, failinterval should be at least 3 times sleepinterval or just one slow update from the keeper to etcd (for various reason) will trigger the failover. Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

@nh2
Copy link
Contributor Author

nh2 commented Aug 29, 2017

They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

@sgotti
Copy link
Member

sgotti commented Aug 30, 2017

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

The faster way, since logs shouldn't be used to monitor application health, will be to move them to a debug level.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

There's already an open issue #314 . For making them configurable we should also add some validation to make the timeout at least 3*interval. But if your store becomes slow or there's a store master election (a store member dies) it could take a time greater than the timeout and the proxy will timeout closing all the connections.

@demisx
Copy link

demisx commented Apr 24, 2019

We would also love to see this configurable. Currently we are using skaffold for development that merges all logs from all pods in one terminal output and so much scrolling makes it unusable pretty much. Would be great if repeating messages could be moved to DEBUG level instead.

In the meantime, I am decreasing verbosity by setting PG log level to "FATAL" in development during Stolon cluster init, which is less than ideal, of course:

kubectl run -i -t stolonctl --image=sorintlab/stolon:v0.13.0-pg11 --restart=Never \
  --rm -- /usr/local/bin/stolonctl --cluster-name=kube-stolon --store-backend=kubernetes \
  --kube-resource-kind=configmap init -y '{"initMode": "new", "pgParameters": {"log_min_messages": "FATAL"}}'

@icefed
Copy link

icefed commented Jun 25, 2019

Debug is ok, but I think it should logging only on status changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants