-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns about stability of automatic restart of nodes feature #707
Comments
But when the master restart, the standby should then start correctly or am I missing something?
This is another feature and not blocking since user can already restart the "postgres" without the need of a stolon command by restarting the keeper (or a comment where I propose to add a signal handler to the keeper to just restart postgres here: #255 (comment)). There're some similar issues (like #88) but it was closed. I'm personally against adding such command for many reasons:
|
Your statement is correct when master node restarts earlier than standby one after increasing of problem parameter. But in reverse case (when standby restarts earlier then master after decreasing parameter) standby node will see the old value of parameter on master forever and cannot restart normally. It will be required to revert changes of that parameter manually.
OK, I have understood your idea. Then I assume the sentinel is responsible for orchestrating of applying changes from pgParameters in right order within the automatic restart feature. |
IMO manual (or automatic from external script) postmaster killing looks like duct tape. Users have to has the more convenient way to restart PostgreSQL (not keeper) nodes.
I could propose to use some counter to specify the number of required restarts, smth like |
Your statement about the standby node does not "see" the old value of parameter on master and get stuck there. The standby node's configuration is from its From PostgreSQL docs, the key point is
The standby may fail restarts first. But then keeper will keep on restarting it again. However, this is going to happen atmost once. The moment the master has restarted with updated configurations, the standby restart will work. |
Not exactly, the problem here that standby relies on config data from control file on startup stage to determine that critical to restart parameters satisfy invariant from doc:
This is described in postgres sources. After decreasing the critical parameter on master side firstly, the new value is replicated to standby in the context of But if standby applies decreased value early that this value is replicated from master and applied to control file, than it cannot further restart (that invariant violation is fatal to startup process). And if there not to revert to old value then the single workaround here is manually to fix that value in control file that looks like a big hack. |
IMO the most reasonable way to handle this issue is to check invariant on critical updated parameters on keeper side (compare with corresponding values kept from local control file) before their applying and restarting standby. |
I got this issue in my env. After I changed max_connection to a smaller value. The master is restarted and and the max_connection is updated. While the standby failed to be started. I can't find a way to work around it except change the max_connection back to a larger one. Any way to workaround this issue to change the max_connection to a smaller one? |
Hi, @jinguiy ! Unfortunately there is no other solution here as to revert |
@maksm90 thanks. I tried deleting pods to restart the pod, this way takes time to restart the master, so the standby is promote to master, the stolon cluster failed to start by this way. The worked way is restart postgres in the pod in order. |
Submission type
Environment
Stolon version
0.14.0
Bug description
Some PostgreSQL parameters (
max_connections
,max_locks_per_transaction
, etc.) require a special care to order of restarting nodes. This issue is described in doc. The current implementation of auto restart feature #568 doesn't take into account this issue and could lead to a problem.There is another issue against auto restart feature. The PostgreSQL instance does so called shutdown checkpoint under normal stopping when all dirty buffers flush to disk. For such instances that incorporate mass changes inside buffer pool (e.g. with expanded checkpoints) there could be longstanding downtime when all nodes restart simultaneously. Therefore it makes sense to do explicit checkpoint before restarting nodes.
Steps to reproduce the problem
After decreasing
max_connections
from 100 to 50 the standby node cannot restart and fails with error:Enhancement Description
As automatic node restart in cluster has subtle issues I propose to delegate this to user exposing command to explicit restart nodes
stolonctl restartkeeper <keeper_name>
as it's implemented in patroni. When all necessary infrastructure for auto restart will be ready return back optionautomaticPgRestart
.Any thoughts?
The text was updated successfully, but these errors were encountered: