Re: Refactoring postmaster's code to cleanup after child exit

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Date: 2024-12-10 10:00:12
Message-ID: 4d964960-72fa-4741-8a7c-879b958eaa29@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/12/2024 22:55, Heikki Linnakangas wrote:
> Not sure how to fix this. A small sleep in the test would work, but in
> principle there's no delay that's guaranteed to be enough. A more robust
> solution would be to run a "select count(*) from pg_stat_activity" and
> wait until the number of connections are what's expected. I'll try that
> and see how complicated that gets..

Checking pg_stat_activity doesn't help, because the backend doesn't
register itself in pg_stat_activity until later. A connection that's
rejected due to connection limits never shows up in pg_stat_activity.

Some options:

0. Do nothing

1. Add a small sleep to the test

2. Move the pgstat_bestart() call earlier in the startup sequence, so
that a backend shows up in pg_stat_activity before it acquires a PGPROC
entry, and stays visible until after it has released its PGPROC entry.
This would give more visibility to backends that are starting up.

3. Rearrange the FATAL error handling so that the process removes itself
from PGPROC before sending the error to the client. That would be kind
of nice anyway. Currently, if sending the rejection error message to the
client blocks, you are holding up a PGPROC slot until the message is
sent. The error message packet is short, so it's highly unlikely to
block, but still.

Option 3 seems kind of nice in principle, but looking at the code, it's
a bit awkward to implement. Easiest way to implement it would be to
modify send_message_to_frontend() to not call pq_flush() on FATAL
errors, and flush the data in socket_close() instead. Not a lot of code,
but it's a pretty ugly special case.

Option 2 seems nice too, but seems like a lot of work.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2024-12-10 10:20:00 Re: [PATCH] Add sortsupport for range types and btree_gist
Previous Message Amit Kapila 2024-12-10 09:15:48 Re: Memory leak in WAL sender with pgoutput (v10~)