Re: BF animal malleefowl reported an failure in 001_password.pl

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BF animal malleefowl reported an failure in 001_password.pl
Date: 2023-01-16 22:24:23
Message-ID: CA+hUKGKykFAoj3Ydyi84aXyQc-mFgPKPadQ2ppsGMqhzcAxDNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Here's a sketch of the first idea.

To hit the problem case, the signal needs to arrive in between the
latch->is_set check and the epoll_wait() call, and the handler needs
to take a while to get started. (If it arrives before the
latch->is_set check we report WL_LATCH_SET immediately, and if it
arrives after the epoll_wait() call begins, we get EINTR and go back
around to the latch->is_set check.) With some carefully placed sleeps
to simulate a CPU-starved system (see attached) I managed to get a
kill-then-connect sequence to produce:

2023-01-17 10:48:32.508 NZDT [555849] LOG: nevents = 2
2023-01-17 10:48:32.508 NZDT [555849] LOG: events[0] = WL_SOCKET_ACCEPT
2023-01-17 10:48:32.508 NZDT [555849] LOG: events[1] = WL_LATCH_SET
2023-01-17 10:48:32.508 NZDT [555849] LOG: received SIGHUP, reloading
configuration files

With the patch I posted, we process that in the order we want:

2023-01-17 11:06:31.340 NZDT [562262] LOG: nevents = 2
2023-01-17 11:06:31.340 NZDT [562262] LOG: events[1] = WL_LATCH_SET
2023-01-17 11:06:31.340 NZDT [562262] LOG: received SIGHUP, reloading
configuration files
2023-01-17 11:06:31.344 NZDT [562262] LOG: events[0] = WL_SOCKET_ACCEPT

Other thoughts:

Another idea would be to teach the latch infrastructure itself to
magically swap latch events to position 0. Latches are usually
prioritised; it's only in this rare race case that they are not.

Or going the other way, I realise that we're lacking a "wait for
reload" mechanism as discussed in other threads (usually people want
this if they care about its effects on backends other than the
postmaster, where all bets are off and Andres once suggested the
ProcSignalBarrier hammer), and if we ever got something like that it
might be another solution to this particular problem.

Attachment Content-Type Size
x.patch text/x-patch 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-01-16 22:37:28 Re: almost-super-user problems that we haven't fixed yet
Previous Message Peter Geoghegan 2023-01-16 21:58:21 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation