Compactor exposes error metric when failing over to another instance #16388

jan-kantert · 2025-02-20T14:12:33Z

Describe the bug
When the loki-backend instance which runs compactor shuts down and comes back up again we see the failure count increase in the oki_boltdb_shipper_compact_tables_operation_total metric.

To Reproduce
Steps to reproduce the behavior:

Started Loki (3.3.2) in scalable mode
Restart the pod running compactor
Observe delta(loki_boltdb_shipper_compact_tables_operation_total{status="failure"}[5m]) via prometheus (or query the metric endpoint)

Expected behavior
When the loki-backend instance which runs compactor restarts we expect a graceful failover of the compactor. We expect loki_boltdb_shipper_compact_tables_operation_total to not count anyfailure in that case.

Environment:

Infrastructure: Kubernetes 1.30
Deployment tool: helm

Screenshots, Promtail config, or terminal output
Log lines when this happens:

loki-backend-2 - info: finished compacting table 
loki-backend-2 - info: compacting table 
loki-backend-2 - info: finished compacting table 
loki-backend-1 - info: this instance has been chosen to run the compactor, starting compactor 
loki-backend-1 - info: waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor 
loki-backend-2 - info: compactor exiting 
loki-backend-2 - info: waiting until compactor is JOINING in the ring 
loki-backend-2 - info: compactor is JOINING in the ring 
loki-backend-2 - info: waiting until compactor is ACTIVE in the ring 
loki-backend-2 - info: compactor is ACTIVE in the ring 
loki-backend-1 - info: this instance should no longer run the compactor, stopping compactor 
loki-backend-1 - info: compactor stopped 
loki-backend-1 - error: failed to run compaction - failed to list tables: RequestCanceled: request context canceled
caused by: context canceled
loki-backend-2 - info: this instance has been chosen to run the compactor, starting compactor 
loki-backend-1 - info: compactor started 
loki-backend-2 - info: waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compactor exposes error metric when failing over to another instance #16388

Compactor exposes error metric when failing over to another instance #16388

jan-kantert commented Feb 20, 2025 •

edited

Loading

Compactor exposes error metric when failing over to another instance #16388

Compactor exposes error metric when failing over to another instance #16388

Comments

jan-kantert commented Feb 20, 2025 • edited Loading

jan-kantert commented Feb 20, 2025 •

edited

Loading