Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres Backup not working in vCluster #4086

Open
farhaan-shamsee opened this issue Feb 4, 2025 · 3 comments
Open

Postgres Backup not working in vCluster #4086

farhaan-shamsee opened this issue Feb 4, 2025 · 3 comments

Comments

@farhaan-shamsee
Copy link

  • Platform: (EKS v1.30.4-eks-a737599, vCluster v0.15.2.)
  • PGO Image Tag: (e.g. 5.5.0)
  • Postgres Version (e.g. 14.9)
  • Storage: (e.g. gp3)

Questions

We are using the PostgresCluster for our keycloak instance. We are observing that the pgbackrest is failing to connect to the *-repo-host pods inside the vCluster

Image

After sometime the Disk is out of space because of WAL keeps getting accumulated. Below is the screenshot for the same.

Image

The below is the screenshot of the database disk usage after I re-created it.

Image

I see the below files in the wal directory

Image

Under the archive_status I see the below:

Image

I read from the docs that the *.ready should become *.done if the backup is successful.

The same configuration works in my host cluster (EKS).

Can anyone help me to figure this out.

@andrewlecuyer
Copy link
Collaborator

hi @farhaan-shamsee, just wanted to let you know that I'm seeing the same issue when deploying a PostgresCluster within a vCluster. And from what I can tell, this appears to be an issue with vCluster networking, rather than anything specific to PGO.

More specifically, the DNS name PGO is using to connect to the repo host (e.g. hippo-repo-host-0.hippo-pods.postgres-operator.svc.kubernetes in my PostgresCluster named hippo) should definitely be valid. And the fact that this works just fine outside of a vCluster confirms this.

So it looks like some input from the vCluster project is likely required to get a better idea as to why this isn't working.

@andrewlecuyer
Copy link
Collaborator

This appears to be an issue with DNS due to vCluster adding hostAliases to the various Pods it creates, which is affecting the creation of DNS for services.

@farhaan-shamsee can you try setting the following environment variable in your PGO deployment, and let me know what you get?

kubectl -n postgres-operator set env deploy/pgo GODEBUG=netdns=cgo

Thanks!

@farhaan-shamsee
Copy link
Author

farhaan-shamsee commented Feb 14, 2025

Thanks a lot for helping me out.

@farhaan-shamsee can you try setting the following environment variable in your PGO deployment, and let me know what you get?

What specifically I should share?

These are the PGO logs:

Image

And after I update the ENV, the pgha1-* pod went into error state. And now I see the below logs:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants