Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
Date: 2023-08-11 20:26:07
Message-ID: 20230811202607.l5evqsryq4ic2jlw@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-08-12 07:51:09 +1200, Thomas Munro wrote:
> Oh, I see what's happening. Maybe commit b91dd9de wasn't the best
> idea, but bc971f4025c broke an assumption, since it doesn't use
> ConditionVariableSleep(). That is confusing the signal forwarding
> logic which expects to find our entry in the wait list in the common
> case.

Hm, I guess I got confused by the cv code once more. I thought that
ConditionVariableCancelSleep() would wake us up anyway, because
once we return from ConditionVariableSleep(), we'd be off the list. But I now
realize (and I think not for the first time), that ConditionVariableSleep()
always puts us *back* on the list.

Leaving aside the issue in this thread, isn't always adding us back into the
list bad from a contention POV alone - it doubles the write traffic on the CV
and is guaranteed to cause contention for ConditionVariableBroadcast(). I
wonder if this explains some performance issues I've seen in the past.

What if we instead reset cv_sleep_target once we've been taken off the list? I
think it'd not be too hard to make it safe to do the proclist_contains()
without the spinlock. Lwlocks have something similar, there we solve it by
this sequence:

proclist_delete(&wakeup, iter.cur, lwWaitLink);

/*
* Guarantee that lwWaiting being unset only becomes visible once the
* unlink from the link has completed. Otherwise the target backend
* could be woken up for other reason and enqueue for a new lock - if
* that happens before the list unlink happens, the list would end up
* being corrupted.
*
* The barrier pairs with the LWLockWaitListLock() when enqueuing for
* another lock.
*/
pg_write_barrier();
waiter->lwWaiting = LW_WS_NOT_WAITING;
PGSemaphoreUnlock(waiter->sem);

I guess this means we'd need something like lwWaiting for CVs as well.

> From a85b2827f4500bc2e7c533feace474bc47086dfa Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> Date: Sat, 12 Aug 2023 07:06:08 +1200
> Subject: [PATCH] De-pessimize ConditionVariableCancelSleep().
>
> Commit b91dd9de was concerned with a theoretical problem with our
> non-atomic condition variable operations. If you stop sleeping, and
> then cancel the sleep in a separate step, you might be signaled in
> between, and that could be lost. That doesn't matter for callers of
> ConditionVariableBroadcast(), but callers of ConditionVariableSignal()
> might be upset if a signal went missing like this.

FWIW I suspect at least some of the places that'd have a problem without that
forwarding, might also be racy with it....

> New idea: ConditionVariableCancelSleep() can just return true if you've
> been signaled. Hypothetical users of ConditionVariableSignal() would
> then still have a way to deal with rare lost signals if they are
> concerned about that problem.

Sounds like a plan to me.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-08-11 20:59:40 Re: run pgindent on a regular basis / scripted manner
Previous Message Peter Geoghegan 2023-08-11 20:19:34 Re: AssertLog instead of Assert in some places