From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Refactoring postmaster's code to cleanup after child exit |
Date: | 2025-03-04 22:58:42 |
Message-ID: | gdojfusnbe3ae47n6qjezclpv4462xbdc2ssoadtyklgdw5dqb@b4r5yw3fcifm |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-12-10 12:00:12 +0200, Heikki Linnakangas wrote:
> On 09/12/2024 22:55, Heikki Linnakangas wrote:
> > Not sure how to fix this. A small sleep in the test would work, but in
> > principle there's no delay that's guaranteed to be enough. A more robust
> > solution would be to run a "select count(*) from pg_stat_activity" and
> > wait until the number of connections are what's expected. I'll try that
> > and see how complicated that gets..
>
> Checking pg_stat_activity doesn't help, because the backend doesn't register
> itself in pg_stat_activity until later. A connection that's rejected due to
> connection limits never shows up in pg_stat_activity.
>
> Some options:
>
> 0. Do nothing
>
> 1. Add a small sleep to the test
>
> 2. Move the pgstat_bestart() call earlier in the startup sequence, so that a
> backend shows up in pg_stat_activity before it acquires a PGPROC entry, and
> stays visible until after it has released its PGPROC entry. This would give
> more visibility to backends that are starting up.
We don't necessarily *have* a PGPROC entry for that backend when we run out of
connections, no?
> 3. Rearrange the FATAL error handling so that the process removes itself
> from PGPROC before sending the error to the client. That would be kind of
> nice anyway. Currently, if sending the rejection error message to the client
> blocks, you are holding up a PGPROC slot until the message is sent. The
> error message packet is short, so it's highly unlikely to block, but still.
This is definitely a problem, there was even a recent thread about it. It can
be triggered even with just an ERROR message though :(
For this test, could we perhaps rely on the log messages postmaster logs when
child processes exit?
2025-03-04 17:56:12.528 EST [3509838][not initialized][:0][[unknown]] LOG: connection received: host=[local]
2025-03-04 17:56:12.528 EST [3509838][client backend][:0][[unknown]] FATAL: sorry, too many clients already
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: releasing pm child slot 2
2025-03-04 17:56:12.529 EST [3509817][postmaster][:0][] DEBUG: client backend (PID 3509838) exited with exit code 1
I.e. the test could wait for the 'client backend exited' message using
->wait_for_log()?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-03-04 23:04:16 | Re: Next commitfest app release is planned for March 18th |
Previous Message | Andrew Dunstan | 2025-03-04 22:51:59 | Re: scalability bottlenecks with (many) partitions (and more) |