From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Race conditions with checkpointer and shutdown |
Date: | 2019-04-17 00:21:32 |
Message-ID: | 20190417002132.ukkvrch3tsuqzw7x@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-04-16 17:05:36 -0700, Andres Freund wrote:
> On 2019-04-16 18:59:37 -0400, Robert Haas wrote:
> > On Tue, Apr 16, 2019 at 6:45 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Do we need to think harder about establishing rules for multiplexed
> > > use of the process latch? I'm imagining some rule like "if you are
> > > not the outermost event loop of a process, you do not get to
> > > summarily clear MyLatch. Make sure to leave it set after waiting,
> > > if there was any possibility that it was set by something other than
> > > the specific event you're concerned with".
> >
> > Hmm, yeah. If the latch is left set, then the outer loop will just go
> > through an extra and unnecessary iteration, which seems fine. If the
> > latch is left clear, then the outer loop might miss a wakeup intended
> > for it and hang forever.
>
> Arguably that's a sign that the latch using code in the outer loop(s) isn't
> written correctly? If you do it as:
>
> while (true)
> {
> CHECK_FOR_INTERRUPTS();
>
> ResetLatch(MyLatch);
>
> if (work_needed)
> {
> Plenty();
> Code();
> Using(MyLatch);
> }
> else
> {
> WaitLatch(MyLatch);
> }
> }
>
> I think that's not a danger? I think the problem really is that we
> suggest doing that WaitLatch() unconditionally:
>
> * The correct pattern to wait for event(s) is:
> *
> * for (;;)
> * {
> * ResetLatch();
> * if (work to do)
> * Do Stuff();
> * WaitLatch();
> * }
> *
> * It's important to reset the latch *before* checking if there's work to
> * do. Otherwise, if someone sets the latch between the check and the
> * ResetLatch call, you will miss it and Wait will incorrectly block.
> *
> * Another valid coding pattern looks like:
> *
> * for (;;)
> * {
> * if (work to do)
> * Do Stuff(); // in particular, exit loop if some condition satisfied
> * WaitLatch();
> * ResetLatch();
> * }
>
> Obviously there's the issue that a lot of latch using code isn't written
> that way - but I also don't think there's that many latch using code
> that then also uses latch. Seems like we could fix that. While it has
> obviously dangers of not being followed, so does the
> 'always-set-latch-unless-outermost-loop' approach.
>
> I'm not sure I like the idea of incurring another unnecessary SetLatch()
> call for most latch using places.
>
> I guess there's a bit bigger danger of taking longer to notice
> postmaster-death. But I'm not sure I can quite see that being
> problematic - seems like all we should incur is another cycle through
> the loop, as the latch shouldn't be set anymore.
I think we should thus change our latch documentation to say:
something like:
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index fc995819d35..dc46dd94c5b 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -44,22 +44,31 @@
* {
* ResetLatch();
* if (work to do)
- * Do Stuff();
- * WaitLatch();
+ * DoStuff();
+ * else
+ * WaitLatch();
* }
*
* It's important to reset the latch *before* checking if there's work to
* do. Otherwise, if someone sets the latch between the check and the
* ResetLatch call, you will miss it and Wait will incorrectly block.
*
+ * The reason to only wait on the latch in case there is nothing to do is that
+ * code inside DoStuff() might use the same latch, and leave it reset, even
+ * though a SetLatch() aimed for the outer loop arrived. Which again could
+ * lead to incorrectly blocking in Wait.
+ *
* Another valid coding pattern looks like:
*
* for (;;)
* {
* if (work to do)
- * Do Stuff(); // in particular, exit loop if some condition satisfied
- * WaitLatch();
- * ResetLatch();
+ * DoStuff(); // in particular, exit loop if some condition satisfied
+ * else
+ * {
+ * WaitLatch();
+ * ResetLatch();
+ * }
* }
*
* This is useful to reduce latch traffic if it's expected that the loop's
and adapt code to match (at least in the outer loops).
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-04-17 00:43:42 | Re: REINDEX CONCURRENTLY 2.0 |
Previous Message | Andres Freund | 2019-04-17 00:05:36 | Re: Race conditions with checkpointer and shutdown |