Decreasing log verbosity #328

nh2 · 2017-07-31T21:03:44Z

My stolon cluster is within the same data center and I'd like failovers to be detected and performed immediately, so I have set:

sleepInterval "1s"
requestTimeout "1s"
failInterval "1s"

Seems to be working great but now I get very frequent logs:

Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1288: already standby
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1410: postgres parameters not changed
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1288: already standby
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1410: postgres parameters not changed

Is it possible to decrease the log frequency of these "everything is OK" messages without increasing the latency for failovers?

For example, it'd be great if I can configure that it prints only once per minute that everything is still OK.

If not, I'd like to feature-request that.

The text was updated successfully, but these errors were encountered:

sgotti · 2017-08-08T07:15:14Z

@nh2

The logs are printed at every check so cannot be changed to be printed only at specific intervals without adding more code. They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

As a note:

so I have set:

sleepInterval "1s"
requestTimeout "1s"
failInterval "1s"

I won't suggest using these values, failinterval should be at least 3 times sleepinterval or just one slow update from the keeper to etcd (for various reason) will trigger the failover. Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

nh2 · 2017-08-29T13:36:52Z

They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

sgotti · 2017-08-30T17:51:42Z

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

The faster way, since logs shouldn't be used to monitor application health, will be to move them to a debug level.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

There's already an open issue #314 . For making them configurable we should also add some validation to make the timeout at least 3*interval. But if your store becomes slow or there's a store master election (a store member dies) it could take a time greater than the timeout and the proxy will timeout closing all the connections.

demisx · 2019-04-24T14:32:34Z

We would also love to see this configurable. Currently we are using skaffold for development that merges all logs from all pods in one terminal output and so much scrolling makes it unusable pretty much. Would be great if repeating messages could be moved to DEBUG level instead.

In the meantime, I am decreasing verbosity by setting PG log level to "FATAL" in development during Stolon cluster init, which is less than ideal, of course:

kubectl run -i -t stolonctl --image=sorintlab/stolon:v0.13.0-pg11 --restart=Never \
  --rm -- /usr/local/bin/stolonctl --cluster-name=kube-stolon --store-backend=kubernetes \
  --kube-resource-kind=configmap init -y '{"initMode": "new", "pgParameters": {"log_min_messages": "FATAL"}}'

icefed · 2019-06-25T06:18:27Z

Debug is ok, but I think it should logging only on status changed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decreasing log verbosity #328

Decreasing log verbosity #328

nh2 commented Jul 31, 2017

sgotti commented Aug 8, 2017

nh2 commented Aug 29, 2017

sgotti commented Aug 30, 2017

demisx commented Apr 24, 2019 •

edited

Loading

icefed commented Jun 25, 2019

Decreasing log verbosity #328

Decreasing log verbosity #328

Comments

nh2 commented Jul 31, 2017

sgotti commented Aug 8, 2017

nh2 commented Aug 29, 2017

sgotti commented Aug 30, 2017

demisx commented Apr 24, 2019 • edited Loading

icefed commented Jun 25, 2019

demisx commented Apr 24, 2019 •

edited

Loading