Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener - connection being intermittently dropped when using Postgrest with PGCloudnative backed database. #3778

Open
ethanwaldieferry opened this issue Nov 12, 2024 · 4 comments

Comments

@ethanwaldieferry
Copy link

Environment

  • PostgreSQL version: ghcr.io/clevyr/cloudnativepg-timescale:16.4
  • PostgREST version: postgrest/postgrest:v12.2.3
  • Operating system:

Description of issue

We are trying to use the LISTEN/NOTIFY channel to force schema updates to our Postgrest instances. When we first boot our instances the channel seems to be open. However after a period of time (around 5 mins) I can see that the connection running the statement "LISTEN pgrest;" vanishes - with no logs inside postgrest!

If I restart my postgrest instances - I will see this "LISTEN pgrest;" session appear again - however it will still only persist for about 5 minutes.

I am able to trigger schema updates manually using NOTIFY in this 5 min window however I am unable to do so after. I have checked with multiple clients and can see other clients are receiving events - however postgrest is not.

I also ensured that POSTGREST was connected to the primary - directly without the presence of any load balancer.

Strangely I have a different cluster in K8s that is based on the Patroni operator where I do not have this issue.

Has anyone seen this before? Where the listener simply stop after a period of time and the connection vanishes.

The weirder part is that sometimes hours or days later the listener will come back online for a short while - only to disappear again.

My questions:

  • Is there something in the configuration of my postgres database that would cause the listener to fail intermittently.
  • Is there an issue with the exponential retry growing beyond 32 seconds?

Other experiements:

  • I tried killing all of postgrest's connections this worked and triggered a schema cache reload. However it did not bring back the listener which remains dead until I restart my postgrest instances.

Any ideas on this would be much appreciated. Ive really been running around in circles on this one!

@steve-chavez
Copy link
Member

Is there something in the configuration of my postgres database that would cause the listener to fail intermittently.

There are various timeouts (search "timeout" here). OTOH I don't remember which one affects the LISTEN, maybe the idle_session_timeout. Try querying pg_db_role_setting to see the applied settings.

@steve-chavez
Copy link
Member

Is there an issue with the exponential retry growing beyond 32 seconds?

So it is retrying after the LISTEN connection is lost? Please share the full logs.

@ethanwaldieferry
Copy link
Author

ethanwaldieferry commented Nov 13, 2024

Hey @steve-chavez thanks for the help! It is not retrying, I set my postgrest container to have its log-level at debug and when it loses the listener connection it fails silently. I checked, statement_timeout, transaction_timeout, lock_timeout, idle_in_transaction_session_timeout and idle_session_timeout are all disabled (set to 0).

I have found a way to replicate my issue however.

When I first boot my container I can see three connections in postgrest's pool:

"pid","usename","application_name","query_start","state","query"
454230,"authenticator","PostgREST 12.2.3","2024-11-13 11:01:20.98754+00","idle","COMMIT"
454222,"authenticator","PostgREST 12.2.3","2024-11-13 10:56:40.790304+00","idle","LISTEN ""pgrst"""
454229,"authenticator","PostgREST 12.2.3","2024-11-13 11:01:20.709907+00","idle","COMMIT"

And if I send a notification on the pgrest channel I can see:

│ 13/Nov/2024:11:02:08 +0000: Received a schema cache reload message on the "pgrst" channel                                                                                                                │
│ 13/Nov/2024:11:02:08 +0000: Connection c8670160-7e14-4183-9f1a-3953897de8a3 is terminated due to release                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Connection 0153d1b5-e58e-4d64-98bc-988cffea057a is terminated due to release                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is being established                                                                                                         │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is available                                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is used                                                                                                                      │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is available                                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Successfully connected to PostgreSQL 16.4 (Debian 16.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit                        │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is used                                                                                                                      │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is available                                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is used                                                                                                                      │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is available                                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Config reloaded                                                                                                                                                              │
│ 13/Nov/2024:11:02:08 +0000: Connection 0abad40e-a48a-4feb-8409-5591421e46ac is used                                                                                                                      │
│ 13/Nov/2024:11:02:08 +0000: Connection a5bb2ab3-ed39-4db1-9c82-60c9b3e0d70f is being established                                                                                                         │
│ 13/Nov/2024:11:02:08 +0000: Connection a5bb2ab3-ed39-4db1-9c82-60c9b3e0d70f is available                                                                                                                 │
│ 13/Nov/2024:11:02:08 +0000: Connection a5bb2ab3-ed39-4db1-9c82-60c9b3e0d70f is used                                                                                                                      │
│ 13/Nov/2024:11:02:08 +0000: Connection a5bb2ab3-ed39-4db1-9c82-60c9b3e0d70f is available

If I terminate these connections recently after the container has booted using:

select pg_terminate_backend(pid)
from pg_stat_activity
where application_name = 'PostgREST 12.2.3';

I can see that Postgrest will retry the listener connection.

│ FATAL:  terminating connection due to administrator command                                                                                                                                              │
│ 13/Nov/2024:11:04:25 +0000: Failed listening for notifications on the "pgrst" channel. SSL connection has been closed unexpectedly                                                                       │
│ 13/Nov/2024:11:04:25 +0000: Retrying listening for notifications in 1 seconds...                                                                                                                         │
│ 13/Nov/2024:11:04:26 +0000: Listening for notifications on the "pgrst" channel                                                                                                                           │
│ 13/Nov/2024:11:04:26 +0000: Connection fef6f773-c4a2-44bb-a714-7930d9db3305 is terminated due to release                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Connection 881a7d5e-fa7b-4e1f-aed0-299a8c744243 is terminated due to release                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is being established                                                                                                         │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available                                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is used                                                                                                                      │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available                                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Successfully connected to PostgreSQL 16.4 (Debian 16.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit                        │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is used                                                                                                                      │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available                                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is used                                                                                                                      │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available                                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Config reloaded                                                                                                                                                              │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is used                                                                                                                      │
│ 13/Nov/2024:11:04:26 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available                                                                                                                 │
│ 13/Nov/2024:11:04:26 +0000: Schema cache queried in 3.8 milliseconds                                                                                                                                     │
│ 13/Nov/2024:11:04:26 +0000: Schema cache loaded 18 Relations, 19 Relationships, 6 Functions, 0 Domain Representations, 4 Media Type Handlers, 1196 Timezones                                             │
│ 13/Nov/2024:11:04:26 +0000: Schema cache loaded in 0.6 milliseconds                                                                                                                                      │
│ 13/Nov/2024:11:04:29 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is used                                                                                                                      │
│ 13/Nov/2024:11:04:29 +0000: Connection ceeadb60-cd0e-45a2-a401-229949b3d66a is available

If however I wait for some time - I see only one connection:

"pid","usename","application_name","query_start","state","query"
456027,"authenticator","PostgREST 12.2.3","2024-11-13 11:38:18.949938+00","idle","COMMIT"

If I force a re-connection:

select pg_terminate_backend(pid)
from pg_stat_activity
where application_name = 'PostgREST 12.2.3';

Postgrest does not restart the listener:

│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is being established                                                                                                         │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available                                                                                                                 │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is used                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available                                                                                                                 │
│ 13/Nov/2024:11:37:36 +0000: Successfully connected to PostgreSQL 16.4 (Debian 16.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit                        │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is used                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available                                                                                                                 │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is used                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available                                                                                                                 │
│ 13/Nov/2024:11:37:36 +0000: Config reloaded                                                                                                                                                              │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is used                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available                                                                                                                 │
│ 13/Nov/2024:11:37:36 +0000: Schema cache queried in 3.8 milliseconds                                                                                                                                     │
│ 13/Nov/2024:11:37:36 +0000: Schema cache loaded 18 Relations, 19 Relationships, 6 Functions, 0 Domain Representations, 4 Media Type Handlers, 1196 Timezones                                             │
│ 13/Nov/2024:11:37:36 +0000: Schema cache loaded in 0.7 milliseconds                                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is used                                                                                                                      │
│ 13/Nov/2024:11:37:36 +0000: Connection 8505695e-01a2-4dda-8bc7-33ae44019964 is available

The timeline is visible in the logs:

13/Nov/2024:10:56:40 +0000: Listening for notifications on the "pgrst" channel
13/Nov/2024:10:56:47 +0000: Received a schema cache reload message on the "pgrst" channel
13/Nov/2024:11:02:08 +0000: Received a schema cache reload message on the "pgrst" channel
13/Nov/2024:11:02:48 +0000: Failed listening for notifications on the "pgrst" channel. SSL connection has been closed unexpectedly
13/Nov/2024:11:02:49 +0000: Listening for notifications on the "pgrst" channel
13/Nov/2024:11:03:36 +0000: Failed listening for notifications on the "pgrst" channel. SSL connection has been closed unexpectedly
13/Nov/2024:11:03:37 +0000: Listening for notifications on the "pgrst" channel
13/Nov/2024:11:03:50 +0000: Failed listening for notifications on the "pgrst" channel. SSL connection has been closed unexpectedly
13/Nov/2024:11:03:51 +0000: Listening for notifications on the "pgrst" channel
13/Nov/2024:11:04:25 +0000: Failed listening for notifications on the "pgrst" channel. SSL connection has been closed unexpectedly
13/Nov/2024:11:04:26 +0000: Listening for notifications on the "pgrst" channel
13/Nov/2024:11:05:35 +0000: Received a schema cache reload message on the "pgrst" channel
13/Nov/2024:11:05:42 +0000: Received a schema cache reload message on the "pgrst" channel

Between 11:05 and 11:37 the listener connection died and was not revived by postgrest. Note I manually forced those re-connections between 11:02 and 11:04.

Does the Listener have any kind of keep-alive?

@steve-chavez
Copy link
Member

Unfortunately I'm unable to reproduce locally. When I kill the pg connections with pg_terminate_backend, I don't get the SSL connection has been closed unexpectedly error but a server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Maybe that error triggers this behavior.

@ethanwaldieferry Can you reproduce this reliably? Could you reproduce with a connection to a non-containerized postgres? Or perhaps to a different container image (like https://hub.docker.com/_/postgres) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants