Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-26 20:33:49
Message-ID: zfkgslbfxyrzntjclthxa5jo75es3qqt5sdavci4givvszp4u3@iwu2ciwdxhcb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-25 17:19:15 -0700, Noah Misch wrote:
> On Mon, Mar 24, 2025 at 09:18:06PM -0400, Andres Freund wrote:
> > @@ -296,7 +299,9 @@ pgaio_io_call_complete_local(PgAioHandle *ioh)
> >
> > /*
> > * Note that we don't save the result in ioh->distilled_result, the local
> > - * callback's result should not ever matter to other waiters.
> > + * callback's result should not ever matter to other waiters. However, the
> > + * local backend does care, so we return the result as modified by local
> > + * callbacks, which then can be passed to ioh->report_return->result.
> > */
> > pgaio_debug_io(DEBUG3, ioh,
> > "after local completion: distilled result: (status %s, id %u, error_data %d, result %d), raw_result: %d",
>
> Should this debug message remove the word "distilled", since this commit
> solidifies distilled_result as referring to the complete_shared result?

Good point, updated.

> > Subject: [PATCH v2.12 01/28] aio: Be more paranoid about interrupts
> Ready for commit
> > Subject: [PATCH v2.12 02/28] aio: Pass result of local callbacks to
> > ->report_return
>
> Ready for commit w/ up to one cosmetic change:
>

And pushed. Together with the s/pgaio_io_prep_/s/pgaio_io_start_/ renaming
we've been discussing. Btw, I figured out the origin of that, I was just
mirroring the liburing API...

Thanks again for the reviews.

> > Subject: [PATCH v2.12 03/28] aio: Add liburing dependency
>
> Ready for commit
>
>
> > Subject: [PATCH v2.12 04/28] aio: Add io_method=io_uring
>
> Ready for commit w/ open_fd.fixup

Yay. Planning to push those soon.

> > Subject: [PATCH v2.12 05/28] aio: Implement support for reads in smgr/md/fd
>
> Ready for commit w/ up to two cosmetic changes:

Cool.

> > +/*
> > + * AIO error reporting callback for mdstartreadv().
> > + *
> > + * Errors are encoded as follows:
> > + * - PgAioResult.error_data != 0 encodes IO that failed with errno != 0
>
> I recommend replacing "errno != 0" with either "that errno" or "errno ==
> error_data".

Applied.

> Second, the aio_internal.h comment changes discussed in
> postgr.es/m/20250325155808.f7.nmisch@google.com and earlier.

Here's my current version of that:

* Note that the externally visible functions to start IO
* (e.g. FileStartReadV(), via pgaio_io_start_readv()) move an IO from
* PGAIO_HS_HANDED_OUT to at least PGAIO_HS_STAGED and at most
* PGAIO_HS_COMPLETED_LOCAL (at which point the handle will be reused).

Does that work?

I think I'll push that as part of the comment updates patch instead of
"Implement support for reads in smgr/md/fd", unless you see a reason to do so
differently. I'd have done it in the patch to s/prep/start/, but then it would
reference functions that don't exist yet...

> > Subject: [PATCH v2.12 06/28] aio: Add README.md explaining higher level design
>
> Ready for commit

Cool.

Comments in it reference PGAIO_HCB_SHARED_BUFFER_READV, so I'm inclined to
reorder it until after "bufmgr: Implement AIO read support".

There's also a small change in a new patch in the series (not yet sent), due
to the changes related to emitting WARNINGs about checksum failures to the
client connection. I think that part is fine, but...

> (This and the previous patch have three spots that would change with the
> s/prep/start/ renames. No opinion on whether to rename before or rename
> after.)

I thought it'd be better to do the renaming first.

> > Subject: [PATCH v2.12 07/28] localbuf: Track pincount in BufferDesc as well
>
> The plan here looks good:
> postgr.es/m/dbeeaize47y7esifdrinpa2l7cqqb67k72exvuf3appyxywjnc@7bt76mozhcy2

> > Subject: [PATCH v2.12 08/28] bufmgr: Implement AIO read support
>
> See review here and later discussion:
> postgr.es/m/20250325022037.91.nmisch@google.com

I'm working on a version with those addressed.

> > Subject: [PATCH v2.12 09/28] bufmgr: Use AIO in StartReadBuffers()
>
> Ready for commit after a batch of small things, all but one of which have no
> implications beyond code cosmetics.

Yay.

> I like the test coverage (by the end of the patch series).

I'm really shocked just how bad our test coverage for a lot of this is today
:(

> For anyone else following, I found "diff -w" helpful for the bufmgr.c
> changes. That's because a key part is former WaitReadBuffers() code moving
> up an indentation level to its home in new subroutine AsyncReadBuffers().

For reviewing changes that move stuff around a lot I find this rather helpful:
git diff --color-moved --color-moved-ws=ignore-space-change

That highlights removed code differently from moved code, and due to
ignore-space-change considers code that changed just due to space changes, to
be moved.

> > Assert(*nblocks == 1 || allow_forwarding);
> > Assert(*nblocks > 0);
> > Assert(*nblocks <= MAX_IO_COMBINE_LIMIT);
> > + Assert(*nblocks == 1 || allow_forwarding);
>
> Duplicates the assert three lines back.

Ah, it was moved into ce1a75c4fea, which I didn't notice while rebasing...

> > + nblocks = aio_ret->result.result;
> > +
> > + elog(DEBUG3, "partial read, will retry");
> > +
> > + }
> > + else if (aio_ret->result.status == PGAIO_RS_ERROR)
> > + {
> > + pgaio_result_report(aio_ret->result, &aio_ret->target_data, ERROR);
> > + nblocks = 0; /* silence compiler */
> > + }
> >
> > Assert(nblocks > 0);
> > Assert(nblocks <= MAX_IO_COMBINE_LIMIT);
> >
> > + operation->nblocks_done += nblocks;
>
> I struggled somewhat from the variety of "nblocks" variables: this local
> nblocks, operation->nblocks, actual_nblocks, and *nblocks in/out parameters of
> some functions. No one of them is clearly wrong to use the name, and some of
> these names are preexisting. That said, if you see opportunities to push in
> the direction of more-specific names, I'd welcome it.
>
> For example, this local variable could become add_to_nblocks_done instead.

I named it "newly_read_blocks", hope that works?

> > + AsyncReadBuffers(operation, &nblocks);
>
> I suggest renaming s/nblocks/ignored_nblocks_progress/ here.

Adopted.

Unfortunately I didn't see a good way to reduce the amount of the other
nblocks variables, as they are all, I think, preexisting.

> > + * If we need to wait for IO before we can get a handle, submit already
> > + * staged IO first, so that other backends don't need to wait. There
>
> s/already staged/already-staged/. Normally I'd skip this as nitpicking, but I
> misread this particular sentence twice, as "submit" being the subject that
> "staged" something. (It's still nitpicking, alas.)

Makes sense - it doesn't help that it was at a linebreak...

> > /*
> > * How many neighboring-on-disk blocks can we scatter-read into other
> > * buffers at the same time? In this case we don't wait if we see an
> > - * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the
> > + * I/O already in progress. We already set BM_IO_IN_PROGRESS for the
> > * head block, so we should get on with that I/O as soon as possible.
> > - * We'll come back to this block again, above.
> > + *
> > + * We'll come back to this block in the next call to
> > + * StartReadBuffers() -> AsyncReadBuffers().
>
> Did this mean to say "WaitReadBuffers() -> AsyncReadBuffers()"? I'm guessing
> so, since WaitReadBuffers() is the one that loops. It might be referring to
> read_stream_start_pending_read()'s next StartReadBuffers(), though.

I was referring to the latter, as that is the more common case (it's pretty
easy to hit if you e.g. have multiple sequential scans on the same table
going).

> I think this could just delete the last sentence. The function header comment
> already mentions the possibility of reading a subset of the request. This
> spot doesn't need to detail how the higher layers come back to here.

Agreed.

> > + smgrstartreadv(ioh, operation->smgr, forknum,
> > + blocknum + nblocks_done,
> > + io_pages, io_buffers_len);
> > + pgstat_count_io_op_time(io_object, io_context, IOOP_READ,
> > + io_start, 1, *nblocks_progress * BLCKSZ);
>
> We don't assign *nblocks_progress until lower in the function, so I think
> "io_buffers_len" should replace "*nblocks_progress" here. (This is my only
> non-cosmetic comment on this patch.)

Good catch!

> > Subject: [PATCH v2.12 16/28] aio: Add test_aio module
>
> > +use List::Util qw(sample);
>
> sample() is new in 2020:
> https://metacpan.org/release/PEVANS/Scalar-List-Utils-1.68/source/Changes#L100
>
> Hence, I'd expect some buildfarm failures. I'd try to use shuffle(), then
> take the first N elements.

Hah. Bilal's patch was using shuffe(). I wanted to reduce the number of
iterations and first did as you suggest and then saw that there's a nicer
way...

Done that way again...

> > +++ b/src/test/modules/test_aio/test_aio.c
> > @@ -0,0 +1,712 @@
> > +/*-------------------------------------------------------------------------
> > + *
> > + * delay_execution.c
> > + * Test module to allow delay between parsing and execution of a query.
> > + *
> > + * The delay is implemented by taking and immediately releasing a specified
> > + * advisory lock. If another process has previously taken that lock, the
> > + * current process will be blocked until the lock is released; otherwise,
> > + * there's no effect. This allows an isolationtester script to reliably
> > + * test behaviors where some specified action happens in another backend
> > + * between parsing and execution of any desired query.
> > + *
> > + * Copyright (c) 2020-2025, PostgreSQL Global Development Group
> > + *
> > + * IDENTIFICATION
> > + * src/test/modules/test_aio/test_aio.c
>
> To elaborate on my last review, the entire header comment was a copy from
> delay_execution.c. v2.12 fixes the IDENTIFICATION, but the rest needs
> updates.

I was really too tired that day... Embarassing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2025-03-26 21:20:47 Re: AIO v2.5
Previous Message Melanie Plageman 2025-03-26 20:00:38 Re: Parallel heap vacuum