From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | andres(at)anarazel(dot)de |
Cc: | pgsql-hackers(at)postgresql(dot)org, alvherre(at)2ndquadrant(dot)com, horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp |
Subject: | Re: failure in 019_replslot_limit |
Date: | 2023-04-06 03:09:18 |
Message-ID: | 20230406.120918.290021952458887247.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 5 Apr 2023 11:55:14 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Hi,
>
> On 2023-04-05 11:48:53 -0700, Andres Freund wrote:
> > Note that a checkpoint started at "17:50:23.787", but didn't finish before the
> > database was shut down. As far as I can tell, this can not be caused by
> > checkpoint_timeout, because by the time we get to invalidating replication
> > slots, we already did CheckPointBuffers(), and that's the only thing that
> > delays based on checkpoint_timeout.
> >
> > ISTM that this indicates that checkpointer got stuck after signalling
> > 344783.
> >
> > Do you see any other explanation?
>
> This all sounded vaguely familiar. After a bit bit of digging I found this:
>
> https://postgr.es/m/20220223014855.4lsddr464i7mymk2%40alap3.anarazel.de
>
> Which seems like it plausibly explains the failed test?
As my understanding, ConditionVariableSleep() can experience random
wake-ups and ReplicationSlotControlLock doesn't prevent slot
release. So, I can imagine a situation where that blocking might
happen. If the call ConditionVariableSleep(&s->active_cv) wakes up
unexpectedly due to a latch set for reasons other than the CV
broadcast, and the target process releases the slot between fetching
active_pid in the loop and the following call to
ConditionVariablePrepareToSleep(), the CV broadcast triggered by the
slot release might be missed. If that's the case, we'll need to check
active_pid again after the calling ConditionVariablePrepareToSleep().
Does this make sense?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2023-04-06 03:10:00 | Re: Should vacuum process config file reload more often |
Previous Message | Melanie Plageman | 2023-04-06 02:14:42 | Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode |