Re: AIO v2.5

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-25 02:20:37
Message-ID: 20250325022037.91.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> Subject: [PATCH v2.11 09/27] bufmgr: Implement AIO read support

[I checked that v2.12 doesn't invalidate these review comments, but I didn't
technically rebase the review onto v2.12's line numbers.]

> static void
> TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits,
> - bool forget_owner)
> + bool forget_owner, bool syncio)
> {
> uint32 buf_state;
>
> @@ -5586,6 +5636,14 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits,
> if (clear_dirty && !(buf_state & BM_JUST_DIRTIED))
> buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);
>
> + if (!syncio)
> + {
> + /* release ownership by the AIO subsystem */
> + Assert(BUF_STATE_GET_REFCOUNT(buf_state) > 0);
> + buf_state -= BUF_REFCOUNT_ONE;
> + pgaio_wref_clear(&buf->io_wref);
> + }

Looking at the callers:

ZeroAndLockBuffer[1083] TerminateBufferIO(bufHdr, false, BM_VALID, true, true);
ExtendBufferedRelShared[2869] TerminateBufferIO(buf_hdr, false, BM_VALID, true, true);
FlushBuffer[4827] TerminateBufferIO(buf, true, 0, true, true);
AbortBufferIO[6637] TerminateBufferIO(buf_hdr, false, BM_IO_ERROR, false, true);
buffer_readv_complete_one[7279] TerminateBufferIO(buf_hdr, false, set_flag_bits, false, false);
buffer_writev_complete_one[7427] TerminateBufferIO(buf_hdr, clear_dirty, set_flag_bits, false, false);

I think we can improve on the "syncio" arg name. The first two aren't doing
IO, and AbortBufferIO() may be cleaning up what would have been an AIO if it
hadn't failed early. Perhaps name the arg "release_aio" and pass
release_aio=true instead of syncio=false (release_aio = !syncio).

> + * about which buffers are target by IO can be hard to debug, making

s/target/targeted/

> +static pg_attribute_always_inline PgAioResult
> +buffer_readv_complete_one(uint8 buf_off, Buffer buffer, uint8 flags,
> + bool failed, bool is_temp)
> +{
...
> + if ((flags & READ_BUFFERS_ZERO_ON_ERROR) || zero_damaged_pages)
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("invalid page in block %u of relation %s; zeroing out page",

My earlier review requested s/LOG/WARNING/, but I wasn't thinking about this
in full depth. In the !is_temp case, this runs in a complete_shared callback.
A process unrelated to the original IO may run this callback. That's
unfortunate in two ways. First, that other process's client gets an
unexpected WARNING. The process getting the WARNING may not even have
zero_damaged_pages enabled. Second, the client of the process that staged the
IO gets no message.

AIO ERROR-level messages handle this optimally. We emit a LOG-level message
in the process that runs the complete_shared callback, and we arrange for the
ERROR-level message in the stager. That would be ideal here: LOG in the
complete_shared runner, WARNING in the stager.

One could simplify things by forcing io_method=sync under ZERO_ON_ERROR ||
zero_damaged_pages, perhaps as a short-term approach.

Thoughts?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Hunter 2025-03-25 02:23:26 Re: a pool for parallel worker
Previous Message Hayato Kuroda (Fujitsu) 2025-03-25 02:10:55 RE: pg_recvlogical requires -d but not described on the documentation