- Message in prometheus-alerts Praefect error rate is too high
- Visit the Praefect Dashboard.
- Notice if any error type is spiking.
Filter by index pattern pubsub-praefect-inf-gprd*
Search for:
-
"all SubConns are in TransientFailure" - Indicates there may be a node that praefect cannot reach
-
"PermissionDenied" - Indicates there is a mismatch between the token field under a
[virtual_storage.node]
, and the token under[auth]
in the corresponding Gitaly config.toml.
- Go to https://dashboards.gitlab.net/dashboard/db/praefect?panelId=2&fullscreen and identify the instance with a high error rate.
- ssh into that instance and check the log for its Praefect server for post-mortem:
sudo less /var/log/gitlab/praefect/current