Re: Refactoring postmaster's code to cleanup after child exit

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Date: 2025-03-06 04:49:33
Message-ID: 20250306044933.7a.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote:
> On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:
> > [23:48:44.444](1.129s) ok 3 - reserved_connections limit
> > [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
> > process ended prematurely at
> > /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> > line 154.
> > # Postmaster PID for node "primary" is 198592
>
>
> I just saw this failure on skink in the BF:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23
>
> [17:05:56.438](0.247s) ok 3 - reserved_connections limit
> [17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
> process ended prematurely at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm line 160.
>
>
> > That BackgroundPsql.pm line is this in wait_connect()
> >
> > $self->{run}->pump()
> > until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
>
> A big part of the problem here imo is the exception behaviour that
> IPC::Run::pump() has:
>
> If pump() is called after all harnessed activities have completed, a "process
> ended prematurely" exception to be thrown. This allows for simple scripting
> of external applications without having to add lots of error handling code at
> each step of the script:
>
> Which is, uh, not very compatible with how we use IPC::Run (here and
> elsewhere). Just ending the test because a connection failed is pretty awful.

Historically, I think we've avoided this sort of trouble by doing pipe I/O
only on processes where we feel able to predict when the process will exit.
Commit f44b9b6 is one example (simpler case, not involving pump()). It would
be a nice improvement to do better, since there's always some risk of
unexpected exit.

> This behaviour makes it really hard to debug problems. It'd have been a lot
> easier to understand the problem if we'd seen psql's stderr before the test
> died.
>
> I guess that mean at the very least we'd need to put an eval {} around the
> ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

Officially, you could call ->pumpable() before ->pump(). It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.' I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding. In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".

> Presumably not just in in wait_connect(), but also at least in pump_until()?

If the goal is to have it capture maximum data from processes that exit when
we don't expect it (seems good to me), yes.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2025-03-06 05:03:26 Re: Orphaned users in PG16 and above can only be managed by Superusers
Previous Message Michael Paquier 2025-03-06 04:20:37 Re: Add Pipelining support in psql