Quick Links

Re: AIO v2.5

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject:	Re: AIO v2.5
Date:	2025-03-25 02:52:19
Message-ID:	brd4ebxlbnnehjb2jifkme4hovdmxszxqnnfbuoblrarkgvepu@ydfqaoe3zime
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2025-03-24 19:20:37 -0700, Noah Misch wrote:
> On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > static void
> > TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits,
> > - bool forget_owner)
> > + bool forget_owner, bool syncio)
> > ...
> Looking at the callers:
>
> ZeroAndLockBuffer[1083] TerminateBufferIO(bufHdr, false, BM_VALID, true, true);
> ExtendBufferedRelShared[2869] TerminateBufferIO(buf_hdr, false, BM_VALID, true, true);
> FlushBuffer[4827] TerminateBufferIO(buf, true, 0, true, true);
> AbortBufferIO[6637] TerminateBufferIO(buf_hdr, false, BM_IO_ERROR, false, true);
> buffer_readv_complete_one[7279] TerminateBufferIO(buf_hdr, false, set_flag_bits, false, false);
> buffer_writev_complete_one[7427] TerminateBufferIO(buf_hdr, clear_dirty, set_flag_bits, false, false);
>
> I think we can improve on the "syncio" arg name. The first two aren't doing
> IO, and AbortBufferIO() may be cleaning up what would have been an AIO if it
> hadn't failed early. Perhaps name the arg "release_aio" and pass
> release_aio=true instead of syncio=false (release_aio = !syncio).

Yes, I think that makes sense. Will do that tomorrow.

> > +static pg_attribute_always_inline PgAioResult
> > +buffer_readv_complete_one(uint8 buf_off, Buffer buffer, uint8 flags,
> > + bool failed, bool is_temp)
> > +{
> ...
> > + if ((flags & READ_BUFFERS_ZERO_ON_ERROR) || zero_damaged_pages)
> > + {
> > + ereport(WARNING,
> > + (errcode(ERRCODE_DATA_CORRUPTED),
> > + errmsg("invalid page in block %u of relation %s; zeroing out page",
>
> My earlier review requested s/LOG/WARNING/, but I wasn't thinking about this
> in full depth. In the !is_temp case, this runs in a complete_shared
> callback. A process unrelated to the original IO may run this callback.
> That's unfortunate in two ways. First, that other process's client gets an
> unexpected WARNING. The process getting the WARNING may not even have
> zero_damaged_pages enabled. Second, the client of the process that staged
> the IO gets no message.

Ah, right. That could be why I had flipped it. If so, shame on me for not
adding a comment...

> AIO ERROR-level messages handle this optimally. We emit a LOG-level message
> in the process that runs the complete_shared callback, and we arrange for the
> ERROR-level message in the stager. That would be ideal here: LOG in the
> complete_shared runner, WARNING in the stager.

We could obviously downgrade (crossgrade? A LOG is more severe than a LOG in
some ways, but not others) the message when run in a different backend fairly
easily. Still emitting a WARNING in the stager however is a bit more tricky.

Before thinking more deeply about how we could emit WARNING in the stage:

Is it actually sane to use WARNING here? At least for ZERO_ON_ERROR that could
trigger a rather massive flood of messages to the client in a *normal*
situation. I'm thinking of something like an insert extending a relation some
time after an immediate restart and encountering a lot of FSM corruption (due
to its non-crash-safe-ness) during the search for free space and the
subsequent FSM vacuum. It might be ok to LOG that, but sending a lot of
WARNINGs to the client seems not quite right.

If we want to implement it, I think we could introduce PGAIO_RS_WARN, which
then could tell the stager to issue the WARNING. It would add a bit of
distributed cost, both to callbacks and users of AIO, but it might not be too
bad.

> One could simplify things by forcing io_method=sync under ZERO_ON_ERROR ||
> zero_damaged_pages, perhaps as a short-term approach.

Yea, that could work. Perhaps even just for zero_damaged_pages, after
changing it so that ZERO_ON_ERROR always just LOGs.

Hm, it seems somewhat nasty to have rather different performance
characteristics when forced to use zero_damaged_pages to recover from a
problem. Imagine an instance that's configured to use DIO and then needs to
use zero_damaged_pages to recove from corruption...

/me adds writing a test for both ZERO_ON_ERROR and zero_damaged_pages to the
TODO.

Greetings,

Andres Freund

In response to

Re: AIO v2.5 at 2025-03-25 02:20:37 from Noah Misch

Responses

Re: AIO v2.5 at 2025-03-25 14:11:20 from Noah Misch

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-03-25 02:56:30	Re: query_id: jumble names of temp tables for better pg_stat_statement UX
Previous Message	Sami Imseih	2025-03-25 02:38:35	Re: query_id: jumble names of temp tables for better pg_stat_statement UX