Quick Links

Re: AIO v2.5

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject:	Re: AIO v2.5
Date:	2025-03-25 14:11:20
Message-ID:	20250325141120.8e.nmisch@google.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 24, 2025 at 10:52:19PM -0400, Andres Freund wrote:
> On 2025-03-24 19:20:37 -0700, Noah Misch wrote:
> > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > > +static pg_attribute_always_inline PgAioResult
> > > +buffer_readv_complete_one(uint8 buf_off, Buffer buffer, uint8 flags,
> > > + bool failed, bool is_temp)
> > > +{
> > ...
> > > + if ((flags & READ_BUFFERS_ZERO_ON_ERROR) || zero_damaged_pages)
> > > + {
> > > + ereport(WARNING,
> > > + (errcode(ERRCODE_DATA_CORRUPTED),
> > > + errmsg("invalid page in block %u of relation %s; zeroing out page",
> >
> > My earlier review requested s/LOG/WARNING/, but I wasn't thinking about this
> > in full depth. In the !is_temp case, this runs in a complete_shared
> > callback. A process unrelated to the original IO may run this callback.
> > That's unfortunate in two ways. First, that other process's client gets an
> > unexpected WARNING. The process getting the WARNING may not even have
> > zero_damaged_pages enabled. Second, the client of the process that staged
> > the IO gets no message.
>
> Ah, right. That could be why I had flipped it. If so, shame on me for not
> adding a comment...
>
>
> > AIO ERROR-level messages handle this optimally. We emit a LOG-level message
> > in the process that runs the complete_shared callback, and we arrange for the
> > ERROR-level message in the stager. That would be ideal here: LOG in the
> > complete_shared runner, WARNING in the stager.
>
> We could obviously downgrade (crossgrade? A LOG is more severe than a LOG in
> some ways, but not others) the message when run in a different backend fairly
> easily. Still emitting a WARNING in the stager however is a bit more tricky.
>
> Before thinking more deeply about how we could emit WARNING in the stage:
>
> Is it actually sane to use WARNING here? At least for ZERO_ON_ERROR that could
> trigger a rather massive flood of messages to the client in a *normal*
> situation. I'm thinking of something like an insert extending a relation some
> time after an immediate restart and encountering a lot of FSM corruption (due
> to its non-crash-safe-ness) during the search for free space and the
> subsequent FSM vacuum. It might be ok to LOG that, but sending a lot of
> WARNINGs to the client seems not quite right.

Orthogonal to AIO, I do think LOG (or even DEBUG1?) is better for
ZERO_ON_ERROR. The ZERO_ON_ERROR case also should not use
ERRCODE_DATA_CORRUPTED. (That errcode shouldn't appear for business as usual.
It should signify wrong or irretrievable query results, essentially.)

For zero_damaged_pages, WARNING seems at least defensible, and
ERRCODE_DATA_CORRUPTED is right. It wouldn't be the worst thing to change
zero_damaged_pages to LOG and let the complete_shared runner log it, as long
as we release-note that. It's superuser-only, and the superuser can learn to
check the log. One typically should use zero_damaged_pages in one session at
a time, so the logs won't be too confusing.

Another thought on complete_shared running on other backends: I wonder if we
should push an ErrorContextCallback that adds "CONTEXT: completing I/O of
other process" or similar, so people wonder less about how "SELECT FROM a" led
to a log message about IO on table "b".

> If we want to implement it, I think we could introduce PGAIO_RS_WARN, which
> then could tell the stager to issue the WARNING. It would add a bit of
> distributed cost, both to callbacks and users of AIO, but it might not be too
> bad.
>
>
> > One could simplify things by forcing io_method=sync under ZERO_ON_ERROR ||
> > zero_damaged_pages, perhaps as a short-term approach.
>
> Yea, that could work. Perhaps even just for zero_damaged_pages, after
> changing it so that ZERO_ON_ERROR always just LOGs.

Yes.

> Hm, it seems somewhat nasty to have rather different performance
> characteristics when forced to use zero_damaged_pages to recover from a
> problem. Imagine an instance that's configured to use DIO and then needs to
> use zero_damaged_pages to recove from corruption...

True. I'd be willing to bet high-scale use of zero_damaged_pages is rare. By
high scale, I mean something like reading a whole large table, as opposed to a
TID scan of the known-problematic range. That said, people (including me)
expect the emergency tools to be good even if they're used rarely. You're not
wrong to worry about it.

In response to

Re: AIO v2.5 at 2025-03-25 02:52:19 from Andres Freund

Responses

Re: AIO v2.5 at 2025-03-25 15:57:58 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2025-03-25 14:14:08	Re: Enhancing Memory Context Statistics Reporting
Previous Message	Peter Eisentraut	2025-03-25 13:39:17	Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints