Quick Links

Re: AIO v2.5

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject:	Re: AIO v2.5
Date:	2025-03-30 02:45:13
Message-ID:	20250330024513.ac.nmisch@google.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Mar 29, 2025 at 08:39:54PM -0400, Andres Freund wrote:
> On 2025-03-29 14:29:29 -0700, Noah Misch wrote:
> > On Wed, Mar 26, 2025 at 09:07:40PM -0400, Andres Freund wrote:

> > The choice between LOG and LOG_SERVER_ONLY doesn't matter much for $SUBJECT.
> > If a client has decided to set client_min_messages that high, the client might
> > be interested in the fact that it got side-tracked completing someone else's
> > IO. On the other hand, almost none of those sidetrack events will produce
> > messages. The main argument I'd envision for LOG_SERVER_ONLY is that we
> > consider the message content sensitive, but I don't see the message content as
> > materially sensitive.
>
> I don't think it's sensitive - it just seems a bit sillier to send the same
> thing to the client twice than to the server log.

Ah, that adds weight to the benefit of LOG_SERVER_ONLY.

> I'm happy to change it to
> LOG if you prefer. Your points below mean some comments need to be updated in
> smgr/md.c anyway.

Nah.

> > > +/*
> > > + * smgrstartreadv() -- asynchronous version of smgrreadv()
> > > + *
> > > + * This starts an asynchronous readv IO using the IO handle `ioh`. Other than
> > > + * `ioh` all parameters are the same as smgrreadv().
> >
> > I would add a comment starting with:
> >
> > Compared to smgrreadv(), more responsibilities fall on layers above smgr.
> > Higher layers handle partial reads. smgr will ereport(LOG_SERVER_ONLY) some
> > problems, but higher layers are responsible for pgaio_result_report() to
> > mirror that news to the user and (for ERROR) abort the (sub)transaction.
>
> Hm - if we document that in all the smgrstart* we'd end up with something like
> that in a lot of places - but OTOH, this is the first one so far...

Alternatively, to avoid duplication:

See $PLACE for the tasks that the caller's layer owns, in contrast to smgr
owning them for smgrreadv().

> > I say "comment starting with", because I think there's a remaining decision
> > about who owns the zeroing currently tied to smgrreadv(). An audit of
> > mdreadv() vs. AIO counterparts found this part of mdreadv():
> >
> > if (nbytes == 0)
> > {
> > /*
> > * We are at or past EOF, or we read a partial block at EOF.
> > * Normally this is an error; upper levels should never try to
> > * read a nonexistent block. However, if zero_damaged_pages
> > * is ON or we are InRecovery, we should instead return zeroes
> > * without complaining. This allows, for example, the case of
> > * trying to update a block that was later truncated away.
> > */
> > if (zero_damaged_pages || InRecovery)
> > {
> >
> > I didn't write a test to prove its absence, but I'm not finding such code in
> > the AIO path.
>
> Yes, there is no such codepath
>
> A while ago I had started a thread about whether the above codepath is
> necessary

postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd
which I had forgotten completely.

I've redone your audit, and I agree the InRecovery case is dead code.
check-world InRecovery reaches mdstartreadv() and mdreadv() only via
XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf().

The zero_damaged_pages case entails more of a judgment call about whether or
not its rule breakage eclipses its utility. Fortunately, a disappointed
zero_damaged_pages user could work around that code's absence by stopping the
server and using "dd" to extend the segment with zeros.

> I had planned to put in an error into mdreadv() at the time, but somehow lost
> track of that - I kind of mentally put this issue into the "done" category :(

> At the very least we need to add a comment about this though.

I'm fine with either of:

1. Replace that mdreadv() code with an error.

2. Update comment in mdreadv() that we're phasing out this code due the
InRecovery case's dead code status and the zero_damaged_pages problems; AIO
intentionally doesn't replicate it. Maybe add Assert(false).

I'd do (2) for v18, then do (1) first thing in v19 development.

> > > + *zeroed_or_error_count = rem_error & ((1 << 7) - 1);
> > > + rem_error >>= 7;
> >
> > These raw "7" are good places to use your new #define values. Likewise in
> > buffer_readv_encode_error().
>
> Which define value are you thinking of here? I don't think any of the ones I
> added apply? But I think you're right it'd be good to have some define for
> it, at least locally.

It was just my imagination. Withdrawn.

> It's now:
>
> /*
> * We need a backend-local completion callback for shared buffers, to be able
> * to report checksum errors correctly. Unfortunately that can only safely
> * happen if the reporting backend has previously called
> * pgstat_prepare_report_checksum_failure(), which we can only guarantee in
> * the backend that started the IO. Hence this callback.
> */

Sounds good.

> Updated to:
> /*
> * Throw a WARNING/LOG, as instructed by PIV_LOG_*, if the checksum fails,
> * but only after we've checked for the all-zeroes case.
> */
>
> I found one more, the newly added comment about checksum_failure_p was still
> talking about ignore_checksum_failure, but it should now be IGNORE_CHECKSUM_FAILURE.

That works.

> > > Subject: [PATCH v2.14 18/29] aio: Add test_aio module
> >
> > I didn't yet re-review the v2.13 or 2.14 changes to this one. That's still in
> > my queue.
>
> That's good - I think some of the tests need to expand a bit more. Since
> that's at the end of the dependency chain...

Okay, I'll delay on re-reviewing that one. When it's a good time, please put
the CF entry back in Needs Review. The patches before it are all ready for
commit after the above points of this mail.

In response to

Re: AIO v2.5 at 2025-03-30 00:39:54 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-03-30 04:04:52	Re: Amcheck verification of GiST and GIN
Previous Message	Sutou Kouhei	2025-03-30 02:31:26	Re: Make COPY format extendable: Extract COPY TO format implementations