Re: AIO v2.5

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-26 00:19:15
Message-ID: 20250326001915.bc.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 24, 2025 at 09:18:06PM -0400, Andres Freund wrote:
> Attached v2.12, with the following changes:

> TODO:

> Wonder if it's worth adding some coverage for when checksums are disabled?
> Probably not necessary?

Probably not necessary, agreed. Orthogonal to AIO, it's likely worth a CI
"SPECIAL" and/or buildfarm animal that runs all tests w/ checksums disabled.

> Subject: [PATCH v2.12 01/28] aio: Be more paranoid about interrupts

Ready for commit

> Subject: [PATCH v2.12 02/28] aio: Pass result of local callbacks to
> ->report_return

Ready for commit w/ up to one cosmetic change:

> @@ -296,7 +299,9 @@ pgaio_io_call_complete_local(PgAioHandle *ioh)
>
> /*
> * Note that we don't save the result in ioh->distilled_result, the local
> - * callback's result should not ever matter to other waiters.
> + * callback's result should not ever matter to other waiters. However, the
> + * local backend does care, so we return the result as modified by local
> + * callbacks, which then can be passed to ioh->report_return->result.
> */
> pgaio_debug_io(DEBUG3, ioh,
> "after local completion: distilled result: (status %s, id %u, error_data %d, result %d), raw_result: %d",

Should this debug message remove the word "distilled", since this commit
solidifies distilled_result as referring to the complete_shared result?

> Subject: [PATCH v2.12 03/28] aio: Add liburing dependency

Ready for commit

> Subject: [PATCH v2.12 04/28] aio: Add io_method=io_uring

Ready for commit w/ open_fd.fixup

> Subject: [PATCH v2.12 05/28] aio: Implement support for reads in smgr/md/fd

Ready for commit w/ up to two cosmetic changes:

> +/*
> + * AIO error reporting callback for mdstartreadv().
> + *
> + * Errors are encoded as follows:
> + * - PgAioResult.error_data != 0 encodes IO that failed with errno != 0

I recommend replacing "errno != 0" with either "that errno" or "errno ==
error_data".

Second, the aio_internal.h comment changes discussed in
postgr.es/m/20250325155808.f7.nmisch@google.com and earlier.

> Subject: [PATCH v2.12 06/28] aio: Add README.md explaining higher level design

Ready for commit

(This and the previous patch have three spots that would change with the
s/prep/start/ renames. No opinion on whether to rename before or rename
after.)

> Subject: [PATCH v2.12 07/28] localbuf: Track pincount in BufferDesc as well

The plan here looks good:
postgr.es/m/dbeeaize47y7esifdrinpa2l7cqqb67k72exvuf3appyxywjnc@7bt76mozhcy2

> Subject: [PATCH v2.12 08/28] bufmgr: Implement AIO read support

See review here and later discussion:
postgr.es/m/20250325022037.91.nmisch@google.com

> Subject: [PATCH v2.12 09/28] bufmgr: Use AIO in StartReadBuffers()

Ready for commit after a batch of small things, all but one of which have no
implications beyond code cosmetics. This is my first comprehensive review of
this patch. I like the test coverage (by the end of the patch series). For
anyone else following, I found "diff -w" helpful for the bufmgr.c changes.
That's because a key part is former WaitReadBuffers() code moving up an
indentation level to its home in new subroutine AsyncReadBuffers().

> Assert(*nblocks == 1 || allow_forwarding);
> Assert(*nblocks > 0);
> Assert(*nblocks <= MAX_IO_COMBINE_LIMIT);
> + Assert(*nblocks == 1 || allow_forwarding);

Duplicates the assert three lines back.

> + nblocks = aio_ret->result.result;
> +
> + elog(DEBUG3, "partial read, will retry");
> +
> + }
> + else if (aio_ret->result.status == PGAIO_RS_ERROR)
> + {
> + pgaio_result_report(aio_ret->result, &aio_ret->target_data, ERROR);
> + nblocks = 0; /* silence compiler */
> + }
>
> Assert(nblocks > 0);
> Assert(nblocks <= MAX_IO_COMBINE_LIMIT);
>
> + operation->nblocks_done += nblocks;

I struggled somewhat from the variety of "nblocks" variables: this local
nblocks, operation->nblocks, actual_nblocks, and *nblocks in/out parameters of
some functions. No one of them is clearly wrong to use the name, and some of
these names are preexisting. That said, if you see opportunities to push in
the direction of more-specific names, I'd welcome it.

For example, this local variable could become add_to_nblocks_done instead.

> + AsyncReadBuffers(operation, &nblocks);

I suggest renaming s/nblocks/ignored_nblocks_progress/ here.

> + * If we need to wait for IO before we can get a handle, submit already
> + * staged IO first, so that other backends don't need to wait. There

s/already staged/already-staged/. Normally I'd skip this as nitpicking, but I
misread this particular sentence twice, as "submit" being the subject that
"staged" something. (It's still nitpicking, alas.)

> /*
> * How many neighboring-on-disk blocks can we scatter-read into other
> * buffers at the same time? In this case we don't wait if we see an
> - * I/O already in progress. We already hold BM_IO_IN_PROGRESS for the
> + * I/O already in progress. We already set BM_IO_IN_PROGRESS for the
> * head block, so we should get on with that I/O as soon as possible.
> - * We'll come back to this block again, above.
> + *
> + * We'll come back to this block in the next call to
> + * StartReadBuffers() -> AsyncReadBuffers().

Did this mean to say "WaitReadBuffers() -> AsyncReadBuffers()"? I'm guessing
so, since WaitReadBuffers() is the one that loops. It might be referring to
read_stream_start_pending_read()'s next StartReadBuffers(), though.

I think this could just delete the last sentence. The function header comment
already mentions the possibility of reading a subset of the request. This
spot doesn't need to detail how the higher layers come back to here.

> + smgrstartreadv(ioh, operation->smgr, forknum,
> + blocknum + nblocks_done,
> + io_pages, io_buffers_len);
> + pgstat_count_io_op_time(io_object, io_context, IOOP_READ,
> + io_start, 1, *nblocks_progress * BLCKSZ);

We don't assign *nblocks_progress until lower in the function, so I think
"io_buffers_len" should replace "*nblocks_progress" here. (This is my only
non-cosmetic comment on this patch.)

> Subject: [PATCH v2.12 10/28] aio: Basic read_stream adjustments for real AIO

(Still reviewing this and later patches, but incidental observations follow.)

> Subject: [PATCH v2.12 16/28] aio: Add test_aio module

> +use List::Util qw(sample);

sample() is new in 2020:
https://metacpan.org/release/PEVANS/Scalar-List-Utils-1.68/source/Changes#L100

Hence, I'd expect some buildfarm failures. I'd try to use shuffle(), then
take the first N elements.

> +++ b/src/test/modules/test_aio/test_aio.c
> @@ -0,0 +1,712 @@
> +/*-------------------------------------------------------------------------
> + *
> + * delay_execution.c
> + * Test module to allow delay between parsing and execution of a query.
> + *
> + * The delay is implemented by taking and immediately releasing a specified
> + * advisory lock. If another process has previously taken that lock, the
> + * current process will be blocked until the lock is released; otherwise,
> + * there's no effect. This allows an isolationtester script to reliably
> + * test behaviors where some specified action happens in another backend
> + * between parsing and execution of any desired query.
> + *
> + * Copyright (c) 2020-2025, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + * src/test/modules/test_aio/test_aio.c

To elaborate on my last review, the entire header comment was a copy from
delay_execution.c. v2.12 fixes the IDENTIFICATION, but the rest needs
updates.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-03-26 00:34:02 Re: query_id: jumble names of temp tables for better pg_stat_statement UX
Previous Message Andres Freund 2025-03-26 00:17:17 Re: AIO v2.5