Postgres Backup not working in vCluster #4086

farhaan-shamsee · 2025-02-04T05:32:29Z

Platform: (EKS v1.30.4-eks-a737599, vCluster v0.15.2.)
PGO Image Tag: (e.g. 5.5.0)
Postgres Version (e.g. 14.9)
Storage: (e.g. gp3)

Questions

We are using the PostgresCluster for our keycloak instance. We are observing that the pgbackrest is failing to connect to the *-repo-host pods inside the vCluster

After sometime the Disk is out of space because of WAL keeps getting accumulated. Below is the screenshot for the same.

The below is the screenshot of the database disk usage after I re-created it.

I see the below files in the wal directory

Under the archive_status I see the below:

I read from the docs that the *.ready should become *.done if the backup is successful.

The same configuration works in my host cluster (EKS).

Can anyone help me to figure this out.

The text was updated successfully, but these errors were encountered:

andrewlecuyer · 2025-02-12T00:17:59Z

hi @farhaan-shamsee, just wanted to let you know that I'm seeing the same issue when deploying a PostgresCluster within a vCluster. And from what I can tell, this appears to be an issue with vCluster networking, rather than anything specific to PGO.

More specifically, the DNS name PGO is using to connect to the repo host (e.g. hippo-repo-host-0.hippo-pods.postgres-operator.svc.kubernetes in my PostgresCluster named hippo) should definitely be valid. And the fact that this works just fine outside of a vCluster confirms this.

So it looks like some input from the vCluster project is likely required to get a better idea as to why this isn't working.

andrewlecuyer · 2025-02-13T17:53:12Z

This appears to be an issue with DNS due to vCluster adding hostAliases to the various Pods it creates, which is affecting the creation of DNS for services.

@farhaan-shamsee can you try setting the following environment variable in your PGO deployment, and let me know what you get?

kubectl -n postgres-operator set env deploy/pgo GODEBUG=netdns=cgo

Thanks!

farhaan-shamsee · 2025-02-14T05:45:09Z

Thanks a lot for helping me out.

@farhaan-shamsee can you try setting the following environment variable in your PGO deployment, and let me know what you get?

What specifically I should share?

These are the PGO logs:

And after I update the ENV, the pgha1-* pod went into error state. And now I see the below logs:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres Backup not working in vCluster #4086

Postgres Backup not working in vCluster #4086

farhaan-shamsee commented Feb 4, 2025

andrewlecuyer commented Feb 12, 2025

andrewlecuyer commented Feb 13, 2025

farhaan-shamsee commented Feb 14, 2025 •

edited

Loading

Postgres Backup not working in vCluster #4086

Postgres Backup not working in vCluster #4086

Comments

farhaan-shamsee commented Feb 4, 2025

Questions

andrewlecuyer commented Feb 12, 2025

andrewlecuyer commented Feb 13, 2025

farhaan-shamsee commented Feb 14, 2025 • edited Loading

farhaan-shamsee commented Feb 14, 2025 •

edited

Loading