From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BF animal malleefowl reported an failure in 001_password.pl |
Date: | 2023-01-14 07:55:37 |
Message-ID: | 934208.1673682937@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> writes:
> I noticed one BF failure[1] when monitoring the BF for some other commit.
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=malleefowl&dt=2023-01-13%2009%3A54%3A51
> ...
> So it seems the connection happens before pg_ident.conf is actually reloaded ?
> Not sure if we need to do something make sure the reload happen, because it's
> looks like very rare failure which hasn't happen in last 90 days.
That does look like a race condition between config reloading and
new-backend launching. However, I can't help being suspicious about
the fact that we haven't seen this symptom before and now here it is
barely a day after 7389aad63 (Use WaitEventSet API for postmaster's
event loop). It seems fairly plausible that that did something that
causes the postmaster to preferentially process connection-accept ahead
of SIGHUP. I took a quick look through the code and did not see a
smoking gun, but I'm way too tired to be sure I didn't miss something.
In general, use of WaitEventSet instead of signals will tend to slot
the postmaster into non-temporally-ordered event responses in two
ways: (1) the latch.c code will report events happening at more-or-less
the same time in a specific order, and (2) the postmaster.c code will
react to signal-handler-set flags in a specific order. AFAICS, both
of those code layers will prioritize latch events ahead of
connection-accept events, but did I misread it?
Also it seems like the various platform-specific code paths in latch.c
could diverge as to the priority order of events, which could cause
annoying platform-specific behavior. Not sure there's much to be
done there other than to be sensitive to not letting such divergence
happen.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2023-01-14 08:48:52 | Re: Improve WALRead() to suck data directly from WAL buffers when possible |
Previous Message | vignesh C | 2023-01-14 07:26:19 | Re: fixing CREATEROLE |