pg_rewind failure after the postgres had crashed #295

smkingsoft · 2017-06-14T03:47:37Z

The machine power failure caused the system to crash, so postgres had crashed.

[I] 2017-06-13T09:13:19Z postgresql.go:513: running pg_rewind
[E] 2017-06-13T09:13:19Z keeper.go:619: error syncing with pg_rewind error=error: exit status 1, output: fetched file "global/pg_control", length 8192

target server must be shut down cleanly
Failure, exiting
[I] 2017-06-13T09:13:20Z postgresql.go:552: running pg_basebackup

maybe you can restart it, and shut it down right after recovery has finished, then running pg_rewind.`

The text was updated successfully, but these errors were encountered:

jordijansen · 2018-10-10T10:09:23Z

I've seen this happen on one of our nodes aswell. See the following log:

> 2018-10-10T09:52:36.721Z        INFO    cmd/keeper.go:1893      exclusive lock on data dir taken
> 2018-10-10T09:52:36.857Z        INFO    cmd/keeper.go:501       keeper uid      {"uid": "dc1_01"}
> 2018-10-10T09:52:41.894Z        ERROR   cmd/keeper.go:732       error retrieving cluster data   {"error": "context deadline exceeded"}
> 2018-10-10T09:52:42.049Z        INFO    cmd/keeper.go:994       no db assigned
> 2018-10-10T09:52:42.136Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:44.637Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:47.055Z        INFO    cmd/keeper.go:1049      current db UID different than cluster data db UID       {"db": "562ed5b7", "cdDB": "8f1b9e2a"}
> 2018-10-10T09:52:47.055Z        INFO    cmd/keeper.go:1196      resyncing the database cluster
> 2018-10-10T09:52:47.138Z        ERROR   cmd/keeper.go:650       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
> 2018-10-10T09:52:47.185Z        INFO    cmd/keeper.go:810       syncing using pg_rewind {"followedDB": "f26aa8d4", "keeper": "dc1_03"}
> 2018-10-10T09:52:47.185Z        INFO    postgresql/postgresql.go:811    running pg_rewind
> fetched file "global/pg_control", length 8192
> 
> target server must be shut down cleanly
> Failure, exiting
> 2018-10-10T09:52:47.553Z        ERROR   cmd/keeper.go:813       error syncing with pg_rewind    {"error": "error: exit status 1"}
> 2018-10-10T09:53:00.998Z        INFO    cmd/keeper.go:838       syncing from followed db        {"followedDB": "f26aa8d4", "keeper": "dc1_03"}
> 2018-10-10T09:53:00.998Z        INFO    postgresql/postgresql.go:852    running pg_basebackup
>

sgotti · 2018-10-10T10:53:32Z

@jordijansen @smkingsoft There's an open PR here: #306. If someone is willing to test it.

jordijansen · 2018-11-22T08:15:35Z

@sgotti I'm not sure how to test this PR. It seems to me that a master that comes back online while a other keeper has been elected as the new master, then it uses pg_basebackup instead of pgrewind.

smkingsoft mentioned this issue Jun 14, 2017

stolon-keeper execute pg_rewind and pg_basebackup error #289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pg_rewind failure after the postgres had crashed #295

pg_rewind failure after the postgres had crashed #295

smkingsoft commented Jun 14, 2017

jordijansen commented Oct 10, 2018 •

edited

Loading

sgotti commented Oct 10, 2018

jordijansen commented Nov 22, 2018

pg_rewind failure after the postgres had crashed #295

pg_rewind failure after the postgres had crashed #295

Comments

smkingsoft commented Jun 14, 2017

jordijansen commented Oct 10, 2018 • edited Loading

sgotti commented Oct 10, 2018

jordijansen commented Nov 22, 2018

jordijansen commented Oct 10, 2018 •

edited

Loading