Re: AIO v2.5

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-23 16:32:48
Message-ID: 20250323163248.87.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 23, 2025 at 11:57:48AM -0400, Andres Freund wrote:
> On 2025-03-22 19:09:55 -0700, Noah Misch wrote:
> > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > > Attached v2.11
> >
> > > Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring

> > > --- a/src/backend/utils/activity/wait_event_names.txt
> > > +++ b/src/backend/utils/activity/wait_event_names.txt
> > > @@ -192,6 +192,8 @@ ABI_compatibility:
> > >
> > > Section: ClassName - WaitEventIO
> > >
> > > +AIO_IO_URING_SUBMIT "Waiting for IO submission via io_uring."
> > > +AIO_IO_URING_COMPLETION "Waiting for IO completion via io_uring."
> > > AIO_IO_COMPLETION "Waiting for IO completion."
> >
> > I'm wondering if there's an opportunity to enrich the last two wait event
> > names and/or descriptions. The current descriptions suggest to me more
> > similarity than is actually there. Inputs to the decision:
> >
> > - AIO_IO_COMPLETION waits for an IO in PGAIO_HS_DEFINED, PGAIO_HS_STAGED, or
> > PGAIO_HS_COMPLETED_IO to reach PGAIO_HS_COMPLETED_SHARED. The three
> > starting states are the states where some other backend owns the next
> > action, so the current backend can only wait to be signaled.
> >
> > - AIO_IO_URING_COMPLETION waits for the kernel to do enough so we can move
> > from PGAIO_HS_SUBMITTED to PGAIO_HS_COMPLETED_IO.
> >
> > Possible names and descriptions, based on PgAioHandleState enum names and
> > comments:
> >
> > AIO_IO_URING_COMPLETED_IO "Waiting for IO result via io_uring."
> > AIO_COMPLETED_SHARED "Waiting for IO shared completion callback."
> >
> > If "shared completion callback" is too internals-focused, perhaps this:
> >
> > AIO_IO_URING_COMPLETED_IO "Waiting for IO result via io_uring."
> > AIO_COMPLETED_SHARED "Waiting for IO completion to update shared memory."
>
> Hm, right now AIO_IO_COMPLETION also covers the actual "raw" execution of the
> IO with io_method=worker/sync.

Right, it could start with the IO in PGAIO_HS_DEFINED and end with the IO in
PGAIO_HS_COMPLETED_SHARED. So another part of the wait may be the definer
doing work before exiting batch mode.

> For that AIO_COMPLETED_SHARED would be
> inappropriate.

The concept I had in mind was "waiting to reach PGAIO_HS_COMPLETED_SHARED,
whatever obstacles that involves".

Another candidate description string:

AIO_COMPLETED_SHARED "Waiting for another process to complete IO."

> We could use a different wait event if wait for an IO via CV in
> PGAIO_HS_SUBMITTED, with a small refactoring of pgaio_io_wait(). But I'm not
> sure that would get you that far - we don't broadcast the CV when
> transitioning from PGAIO_HS_SUBMITTED -> PGAIO_HS_COMPLETED_IO, so the wait
> event would stay the same, now wrong, wait event until the shared callback
> completes. Obviously waking everyone up just so they can use a differen wait
> event doesn't make sense.

Agreed. The mapping of code ranges to wait events seems fine to me. I'm mainly
trying to optimize the wait event description strings to fit those code ranges.

> A more minimal change would be to narrow AIO_IO_URING_COMPLETION to
> "execution" or something like that, to hint at a separation between the raw IO
> being completed and the IO, including the callbacks completing.

Yes, that would work for me.

> > > --- a/doc/src/sgml/config.sgml
> > > +++ b/doc/src/sgml/config.sgml
> > > @@ -2710,6 +2710,12 @@ include_dir 'conf.d'
> > > <literal>worker</literal> (execute asynchronous I/O using worker processes)
> > > </para>
> > > </listitem>
> > > + <listitem>
> > > + <para>
> > > + <literal>io_uring</literal> (execute asynchronous I/O using
> > > + io_uring, if available)
> >
> > I feel the "if available" doesn't quite fit, since we'll fail if unavailable.
> > Maybe just "(execute asynchronous I/O using Linux io_uring)" with "Linux"
> > there to reduce surprise on other platforms.
>
> You're right, the if available can be misunderstood. But not mentioning that
> it's an optional dependency seems odd too. What about something like
>
> <para>
> <literal>io_uring</literal> (execute asynchronous I/O using
> io_uring, requires postgres to have been built with
> <link linkend="configure-option-with-liburing"><option>--with-liburing</option></link> /
> <link linkend="configure-with-liburing-meson"><option>-Dliburing</option></link>)
> </para>

I'd change s/postgres to have been built/a build with/ since the SGML docs
don't use the term "postgres" that way. Otherwise, that works for me.

> Should the docs for --with-liburing/-Dliburing mention it's linux only? We
> don't seem to do that for things like systemd (linux), selinux (linux) and
> only kinda for bonjour (macos).

No need, I think.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-03-23 16:43:05 Re: Snapshot related assert failure on skink
Previous Message Andres Freund 2025-03-23 15:57:48 Re: AIO v2.5