Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-23 15:57:48
Message-ID: cdxulvhcemfcvluc6cttva5f5gthya6jcxrj64yhnmy3rmufe2@7kbk657ixbuu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-22 19:09:55 -0700, Noah Misch wrote:
> On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > Attached v2.11
>
> > Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring
>
> Apart from some isolated cosmetic points, this is ready to commit:
>
> > + ereport(ERROR,
> > + errcode(err),
> > + errmsg("io_uring_queue_init failed: %m"),
> > + hint != NULL ? errhint("%s", hint) : 0);
>
> https://www.postgresql.org/docs/current/error-style-guide.html gives the example:
>
> BAD: open() failed: %m
> BETTER: could not open file %s: %m
>
> Hence, this errmsg should change, perhaps to:
> "could not setup io_uring queues: %m".

You're right. I didn't intentionally "violate" the policy, but I do have to
admit, I'm not a huge fan of that aspect, it just obfuscates what actually
failed, forcing one to look at the code or strace to figure out what precisely
failed.

(Changed)

> > + pgaio_debug_io(DEBUG3, ioh,
> > + "wait_one io_gen: %llu, ref_gen: %llu, cycle %d",
> > + (long long unsigned) ref_generation,
> > + (long long unsigned) ioh->generation,
>
> In the message string, io_gen appears before ref_gen. In the subsequent args,
> the order is swapped relative to the message string.

Oops, you're right.

> > --- a/src/backend/utils/activity/wait_event_names.txt
> > +++ b/src/backend/utils/activity/wait_event_names.txt
> > @@ -192,6 +192,8 @@ ABI_compatibility:
> >
> > Section: ClassName - WaitEventIO
> >
> > +AIO_IO_URING_SUBMIT "Waiting for IO submission via io_uring."
> > +AIO_IO_URING_COMPLETION "Waiting for IO completion via io_uring."
> > AIO_IO_COMPLETION "Waiting for IO completion."
>
> I'm wondering if there's an opportunity to enrich the last two wait event
> names and/or descriptions. The current descriptions suggest to me more
> similarity than is actually there. Inputs to the decision:
>
> - AIO_IO_COMPLETION waits for an IO in PGAIO_HS_DEFINED, PGAIO_HS_STAGED, or
> PGAIO_HS_COMPLETED_IO to reach PGAIO_HS_COMPLETED_SHARED. The three
> starting states are the states where some other backend owns the next
> action, so the current backend can only wait to be signaled.
>
> - AIO_IO_URING_COMPLETION waits for the kernel to do enough so we can move
> from PGAIO_HS_SUBMITTED to PGAIO_HS_COMPLETED_IO.
>
> Possible names and descriptions, based on PgAioHandleState enum names and
> comments:
>
> AIO_IO_URING_COMPLETED_IO "Waiting for IO result via io_uring."
> AIO_COMPLETED_SHARED "Waiting for IO shared completion callback."
>
> If "shared completion callback" is too internals-focused, perhaps this:
>
> AIO_IO_URING_COMPLETED_IO "Waiting for IO result via io_uring."
> AIO_COMPLETED_SHARED "Waiting for IO completion to update shared memory."

Hm, right now AIO_IO_COMPLETION also covers the actual "raw" execution of the
IO with io_method=worker/sync. For that AIO_COMPLETED_SHARED would be
inappropriate.

We could use a different wait event if wait for an IO via CV in
PGAIO_HS_SUBMITTED, with a small refactoring of pgaio_io_wait(). But I'm not
sure that would get you that far - we don't broadcast the CV when
transitioning from PGAIO_HS_SUBMITTED -> PGAIO_HS_COMPLETED_IO, so the wait
event would stay the same, now wrong, wait event until the shared callback
completes. Obviously waking everyone up just so they can use a differen wait
event doesn't make sense.

A more minimal change would be to narrow AIO_IO_URING_COMPLETION to
"execution" or something like that, to hint at a separation between the raw IO
being completed and the IO, including the callbacks completing.

> > --- a/doc/src/sgml/config.sgml
> > +++ b/doc/src/sgml/config.sgml
> > @@ -2710,6 +2710,12 @@ include_dir 'conf.d'
> > <literal>worker</literal> (execute asynchronous I/O using worker processes)
> > </para>
> > </listitem>
> > + <listitem>
> > + <para>
> > + <literal>io_uring</literal> (execute asynchronous I/O using
> > + io_uring, if available)
>
> I feel the "if available" doesn't quite fit, since we'll fail if unavailable.
> Maybe just "(execute asynchronous I/O using Linux io_uring)" with "Linux"
> there to reduce surprise on other platforms.

You're right, the if available can be misunderstood. But not mentioning that
it's an optional dependency seems odd too. What about something like

<para>
<literal>io_uring</literal> (execute asynchronous I/O using
io_uring, requires postgres to have been built with
<link linkend="configure-option-with-liburing"><option>--with-liburing</option></link> /
<link linkend="configure-with-liburing-meson"><option>-Dliburing</option></link>)
</para>

Should the docs for --with-liburing/-Dliburing mention it's linux only? We
don't seem to do that for things like systemd (linux), selinux (linux) and
only kinda for bonjour (macos).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-03-23 16:32:48 Re: AIO v2.5
Previous Message Noah Misch 2025-03-23 15:55:29 Re: AIO v2.5