Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_rewind failure after the postgres had crashed #295

Open
smkingsoft opened this issue Jun 14, 2017 · 3 comments
Open

pg_rewind failure after the postgres had crashed #295

smkingsoft opened this issue Jun 14, 2017 · 3 comments

Comments

@smkingsoft
Copy link

The machine power failure caused the system to crash, so postgres had crashed.

[I] 2017-06-13T09:13:19Z postgresql.go:513: running pg_rewind
[E] 2017-06-13T09:13:19Z keeper.go:619: error syncing with pg_rewind error=error: exit status 1, output: fetched file "global/pg_control", length 8192

target server must be shut down cleanly
Failure, exiting
[I] 2017-06-13T09:13:20Z postgresql.go:552: running pg_basebackup

maybe you can restart it, and shut it down right after recovery has finished, then running pg_rewind.`

@jordijansen
Copy link

jordijansen commented Oct 10, 2018

I've seen this happen on one of our nodes aswell. See the following log:

> 2018-10-10T09:52:36.721Z        INFO    cmd/keeper.go:1893      exclusive lock on data dir taken
> 2018-10-10T09:52:36.857Z        INFO    cmd/keeper.go:501       keeper uid      {"uid": "dc1_01"}
> 2018-10-10T09:52:41.894Z        ERROR   cmd/keeper.go:732       error retrieving cluster data   {"error": "context deadline exceeded"}
> 2018-10-10T09:52:42.049Z        INFO    cmd/keeper.go:994       no db assigned
> 2018-10-10T09:52:42.136Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:44.637Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:47.055Z        INFO    cmd/keeper.go:1049      current db UID different than cluster data db UID       {"db": "562ed5b7", "cdDB": "8f1b9e2a"}
> 2018-10-10T09:52:47.055Z        INFO    cmd/keeper.go:1196      resyncing the database cluster
> 2018-10-10T09:52:47.138Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:47.185Z        INFO    cmd/keeper.go:810       syncing using pg_rewind {"followedDB": "f26aa8d4", "keeper": "dc1_03"}
> 2018-10-10T09:52:47.185Z        INFO    postgresql/postgresql.go:811    running pg_rewind
> fetched file "global/pg_control", length 8192
> 
> target server must be shut down cleanly
> Failure, exiting
> 2018-10-10T09:52:47.553Z        ERROR   cmd/keeper.go:813       error syncing with pg_rewind    {"error": "error: exit status 1"}
> 2018-10-10T09:53:00.998Z        INFO    cmd/keeper.go:838       syncing from followed db        {"followedDB": "f26aa8d4", "keeper": "dc1_03"}
> 2018-10-10T09:53:00.998Z        INFO    postgresql/postgresql.go:852    running pg_basebackup
> 

@sgotti
Copy link
Member

sgotti commented Oct 10, 2018

@jordijansen @smkingsoft There's an open PR here: #306. If someone is willing to test it.

@jordijansen
Copy link

@sgotti I'm not sure how to test this PR. It seems to me that a master that comes back online while a other keeper has been elected as the new master, then it uses pg_basebackup instead of pgrewind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants