From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-26 18:31:02 |
Message-ID: | 20250326183102.92.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I reviewed everything up to and including "[PATCH v2.12 17/28] aio, bufmgr:
Comment fixes", the last patch before write support.
postgr.es/m/20250326001915.bc.nmisch@google.com covered patches 1-9, and this
email covers patches 10-17. All remaining review comments are minor, so I've
marked the commitfest entry Ready for Committer. If there's anything you'd
like re-reviewed before you commit it, feel free to bring it to my attention.
Thanks for getting the feature to this stage!
On Mon, Mar 24, 2025 at 09:18:06PM -0400, Andres Freund wrote:
> Subject: [PATCH v2.12 10/28] aio: Basic read_stream adjustments for real AIO
> @@ -631,6 +637,9 @@ read_stream_begin_impl(int flags,
> * For now, max_ios = 0 is interpreted as max_ios = 1 with advice disabled
> * above. If we had real asynchronous I/O we might need a slightly
> * different definition.
> + *
> + * FIXME: Not sure what different definition we would need? I guess we
> + * could add the READ_BUFFERS_SYNCHRONOUSLY flag automatically?
I think we don't need a different definition. max_ios comes from
effective_io_concurrency and similar settings. The above comment's definition
of max_ios=0 matches that GUC's documented behavior:
The allowed range is
<literal>1</literal> to <literal>1000</literal>, or
<literal>0</literal> to disable issuance of asynchronous I/O requests.
I'll guess the comment meant that "advice disabled" is a no-op for AIO, so we
could reasonably argue to have effective_io_concurrency=0 distinguish itself
from effective_io_concurrency=1 in some different way for AIO. Equally,
there's no hurry to use that freedom to distinguish them.
> Subject: [PATCH v2.12 11/28] read_stream: Introduce and use optional batchmode
> support
> This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
> uses it where appropriate.
I'd also use the new flag on the read_stream_begin_smgr_relation() call in
RelationCopyStorageUsingBuffer(). It uses block_range_read_stream_cb, and
other streams of that callback rightly use the flag.
> + * b) directly or indirectly start another batch pgaio_enter_batchmode()
Needs new wording from end of postgr.es/m/20250325155808.f7.nmisch@google.com
> Subject: [PATCH v2.12 12/28] docs: Reframe track_io_timing related docs as
> wait time
> Subject: [PATCH v2.12 13/28] Enable IO concurrency on all systems
Consider also updating this comment to stop focusing on prefetch; I think
changing that aligns with the patch's other changes:
/*
* How many buffers PrefetchBuffer callers should try to stay ahead of their
* ReadBuffer calls by. Zero means "never prefetch". This value is only used
* for buffers not belonging to tablespaces that have their
* effective_io_concurrency parameter set.
*/
int effective_io_concurrency = DEFAULT_EFFECTIVE_IO_CONCURRENCY;
> -#io_combine_limit = 128kB # usually 1-128 blocks (depends on OS)
> +#io_combine_limit = 128kB # usually 1-32 blocks (depends on OS)
I think "usually 1-128" remains right given:
GUC_UNIT_BLOCKS
#define MAX_IO_COMBINE_LIMIT PG_IOV_MAX
#define PG_IOV_MAX Min(IOV_MAX, 128)
> - On systems without prefetch advice support, attempting to configure
> - any value other than <literal>0</literal> will error out.
> + On systems with prefetch advice support,
> + <varname>effective_io_concurrency</varname> also controls the prefetch distance.
Wrap the last line.
> Subject: [PATCH v2.12 14/28] docs: Add acronym and glossary entries for I/O
> and AIO
> These could use a lot more polish.
To me, it's fine as-is.
> I did not actually reference the new entries yet, because I don't really
> understand what our policy for that is.
I haven't seen much of a policy on that.
> Subject: [PATCH v2.12 15/28] aio: Add pg_aios view
> +retry:
> +
> + /*
> + * There is no lock that could prevent the state of the IO to advance
> + * concurrently - and we don't want to introduce one, as that would
> + * introduce atomics into a very common path. Instead we
> + *
> + * 1) Determine the state + generation of the IO.
> + *
> + * 2) Copy the IO to local memory.
> + *
> + * 3) Check if state or generation of the IO changed. If the state
> + * changed, retry, if the generation changed don't display the IO.
> + */
> +
> + /* 1) from above */
> + start_generation = live_ioh->generation;
> + pg_read_barrier();
Based on the "really started after this function was called" and "no risk of a
livelock here" comments below, I think "retry:" should be here. We don't
want to livelock in the form of chasing ever-growing start_generation numbers.
> + /*
> + * The IO completed and a new one was started with the same ID. Don't
> + * display it - it really started after this function was called.
> + * There be a risk of a livelock if we just retried endlessly, if IOs
> + * complete very quickly.
> + */
> + if (live_ioh->generation != start_generation)
> + continue;
> +
> + /*
> + * The IOs state changed while we were "rendering" it. Just start from
s/IOs/IO's/
> + * scratch. There's no risk of a livelock here, as an IO has a limited
> + * sets of states it can be in, and state changes go only in a single
> + * direction.
> + */
> + if (live_ioh->state != start_state)
> + goto retry;
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>target</structfield> <type>text</type>
> + </para>
> + <para>
> + What kind of object is the I/O targeting:
> + <itemizedlist spacing="compact">
> + <listitem>
> + <para>
> + <literal>smgr</literal>, I/O on postgres relations
s/postgres relations/relations/ since SGML docs don't use the term "postgres"
that way.
> Subject: [PATCH v2.12 16/28] aio: Add test_aio module
> --- a/src/test/modules/meson.build
> +++ b/src/test/modules/meson.build
> @@ -1,5 +1,6 @@
> # Copyright (c) 2022-2025, PostgreSQL Global Development Group
>
> +subdir('test_aio')
> subdir('brin')
List is alphabetized; please preserve that.
> +++ b/src/test/modules/test_aio/Makefile
> @@ -0,0 +1,26 @@
> +# src/test/modules/delay_execution/Makefile
Update filename in comment.
> +++ b/src/test/modules/test_aio/meson.build
> @@ -0,0 +1,37 @@
> +# Copyright (c) 2022-2024, PostgreSQL Global Development Group
s/2024/2025/
> --- /dev/null
> +++ b/src/test/modules/test_aio/t/001_aio.pl
s/ {4}/\t/g on this file. It's mostly \t now, with some exceptions.
> + test_inject_worker('worker', $node_worker);
What do we expect to happen if autovacuum or checkpointer runs one of these
injection points? I'm guessing it would at most make that process fail
without affecting the test outcome. If so, that's fine.
> + $waitfor,);
s/,//
> + # normal handle use
> + psql_like($io_method, $psql, "handle_get_release()",
> + qq(SELECT handle_get_release()),
> + qr/^$/, qr/^$/);
> +
> + # should error out, API violation
> + psql_like($io_method, $psql, "handle_get_twice()",
> + qq(SELECT handle_get_release()),
> + qr/^$/, qr/^$/);
Last two lines are a clone of the previous psql_like() call. I guess this
wants to instead call handle_get_twice() and check for some stderr.
> + "read_rel_block_ll() of $tblname page",
What does "_ll" stand for?
> + # Issue IO without waiting for completion, then exit
> + $psql_a->query_safe(
> + qq(SELECT read_rel_block_ll('tbl_ok', 1, wait_complete=>false);));
> + $psql_a->reconnect_and_clear();
> +
> + # Check that another backend can read the relevant block
> + psql_like(
> + $io_method,
> + $psql_b,
> + "completing read started by exited backend",
I think the exiting backend's pgaio_shutdown() completed it.
> +sub test_inject
This deserves a brief comment on the behaviors being tested, like the previous
functions have. It seems to be about short reads and hard failures like EIO.
> Subject: [PATCH v2.12 17/28] aio, bufmgr: Comment fixes
From | Date | Subject | |
---|---|---|---|
Next Message | Nikolay Shaplov | 2025-03-26 18:34:16 | Re: vacuum_truncate configuration parameter and isset_offset |
Previous Message | Andres Freund | 2025-03-26 18:22:14 | Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS) |