Report distinct wait events when waiting for WAL "operation"

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Report distinct wait events when waiting for WAL "operation"
Date: 2023-07-17 16:55:44
Message-ID: 20230717165544.4hdlynfv3jylnlm4@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In a number of workloads one can see two wait events prominently:
LWLock:WALWrite and LWLock:WALInsert. Unfortunately for both that is not very
informative:

LWLock:WALWrite can be reported because there is genuine contention on the
LWLock, or, more commonly, because several backends are waiting for another to
finish IO. In the latter case we are not actually waiting to acquire the lock,
we are waiting for the lock to be released *without* then acquiring it.

LWLock:WALInsert can be reported because there are not enough WALInsert locks
(c.f. NUM_XLOGINSERT_LOCKS) or because we are waiting for another backend to
finish copying a WAL record into wal_buffers. In the latter case we are
therefore not waiting to acquire an LWLock.

I think both of these cases are relevant to distinguish from an operational
perspective. Secondarily, I've received many questions about making those
locks more scalable / granular, when in most of the cases the issue was not
actual lock contention.

Today it's surprisingly hard to figure out whether the issue is lock
contention or the speed of copying buffers for WAL insert locks / computing
the last prc of the CRC checksum.

Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
use the "generic" LWLockReportWaitStart(), but use caller provided wait
events. The attached patch adds two new wait events for the existing callers.

I waffled a bit about which wait event section to add these to. Ended up with
"IPC", but would be happy to change that.

WAIT_EVENT_WAL_WAIT_INSERT WALWaitInsert "Waiting for WAL record to be copied into buffers."
WAIT_EVENT_WAL_WAIT_WRITE WALWaitWrite "Waiting for WAL buffers to be written or flushed to disk."

Previously it was e.g. not really possible to distinguish that something like
this:

┌────────────────┬─────────────────┬────────────┬───────┐
│ backend_type │ wait_event_type │ wait_event │ count │
├────────────────┼─────────────────┼────────────┼───────┤
│ client backend │ LWLock │ WALInsert │ 32 │
│ client backend │ (null) │ (null) │ 9 │
│ walwriter │ IO │ WALWrite │ 1 │
│ client backend │ Client │ ClientRead │ 1 │
│ client backend │ LWLock │ WALWrite │ 1 │
└────────────────┴─────────────────┴────────────┴───────┘

is a workload with a very different bottleneck than this:

┌────────────────┬─────────────────┬───────────────┬───────┐
│ backend_type │ wait_event_type │ wait_event │ count │
├────────────────┼─────────────────┼───────────────┼───────┤
│ client backend │ IPC │ WALWaitInsert │ 22 │
│ client backend │ LWLock │ WALInsert │ 13 │
│ client backend │ LWLock │ WALBufMapping │ 5 │
│ walwriter │ (null) │ (null) │ 1 │
│ client backend │ Client │ ClientRead │ 1 │
│ client backend │ (null) │ (null) │ 1 │
└────────────────┴─────────────────┴───────────────┴───────┘

even though they are very different

FWIW, the former is bottlenecked by the number of WAL insertion locks, the
second is bottlenecked by copying WAL into buffers due to needing to flush
them.

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-Caller-specified-wait-events-for-LWLockWaitForVar.patch text/x-diff 6.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Garrett Thornburg 2023-07-17 17:58:07 Re: Looking for context around which event triggers are permitted
Previous Message Thom Brown 2023-07-17 16:32:51 Re: generic plans and "initial" pruning