Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-23 15:11:53
Message-ID: wd7zzn6kks7otnkkb3olekvaqfnm4meuh5yugn4ghkaiw5opph@zki247ivvrsx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-22 17:20:56 -0700, Noah Misch wrote:
> On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> > Not sure yet how to best disable testing io_uring in this case. We can't
> > just query EXEC_BACKEND from pg_config.h unfortunately. I guess making the
> > initdb not fail and checking the error log would work, but that doesn't work
> > nicely with Cluster.pm.
>
> How about "postgres -c io_method=io_uring -C <anything>":
>
> --- a/src/test/modules/test_aio/t/001_aio.pl
> +++ b/src/test/modules/test_aio/t/001_aio.pl
> @@ -29,7 +29,13 @@ $node_worker->stop();
> # Test io_method=io_uring
> ###
>
> -if ($ENV{with_liburing} eq 'yes')
> +sub have_io_uring
> +{
> + local %ENV = $node_worker->_get_env(); # any node works
> + return run_log [qw(postgres -c io_method=io_uring -C io_method)];
> +}
> +
> +if (have_io_uring())
> {
> my $node_uring = create_node('io_uring');
> $node_uring->start();

Yea, that's a good idea.

One thing that doesn't seem great is that it requires a prior node - what if
we do -c io_method=invalid' that would report the list of valid GUC options,
so we could just grep for io_uring?

It's too bad that postgres --describe-config
a) doesn't report the possible enum values
b) doesn't apply/validate -c options

> > Subject: [PATCH v2.11 01/27] aio, bufmgr: Comment fixes
>
> Ready to commit, though other comment fixes might come up in later reviews.

I'll reorder it to a bit later in the series, to accumulate a few more.

> One idea so far is to comment on valid states after some IoMethodOps
> callbacks:
>
> --- a/src/include/storage/aio_internal.h
> +++ b/src/include/storage/aio_internal.h
> @@ -310,6 +310,9 @@ typedef struct IoMethodOps
> /*
> * Start executing passed in IOs.
> *
> + * Shall advance state to PGAIO_HS_SUBMITTED. (By the time this returns,
> + * other backends might have advanced the state further.)
> + *
> * Will not be called if ->needs_synchronous_execution() returned true.
> *
> * num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE.
> @@ -321,6 +324,12 @@ typedef struct IoMethodOps
> /*
> * Wait for the IO to complete. Optional.
> *
> + * On return, state shall be PGAIO_HS_COMPLETED_IO,
> + * PGAIO_HS_COMPLETED_SHARED or PGAIO_HS_COMPLETED_LOCAL. (The callback
> + * need not change the state if it's already one of those.) If state is
> + * PGAIO_HS_COMPLETED_IO, state will reach PGAIO_HS_COMPLETED_SHARED
> + * without further intervention.
> + *
> * If not provided, it needs to be guaranteed that the IO method calls
> * pgaio_io_process_completion() without further interaction by the
> * issuing backend.

I think these are a good idea. I added those to the copy-edit patch, with a
few more tweaks:

@@ -315,6 +315,9 @@ typedef struct IoMethodOps
/*
* Start executing passed in IOs.
*
+ * Shall advance state to at least PGAIO_HS_SUBMITTED. (By the time this
+ * returns, other backends might have advanced the state further.)
+ *
* Will not be called if ->needs_synchronous_execution() returned true.
*
* num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE.
@@ -323,12 +326,24 @@ typedef struct IoMethodOps
*/
int (*submit) (uint16 num_staged_ios, PgAioHandle **staged_ios);

- /*
+ /* ---
* Wait for the IO to complete. Optional.
*
+ * On return, state shall be on of
+ * - PGAIO_HS_COMPLETED_IO
+ * - PGAIO_HS_COMPLETED_SHARED
+ * - PGAIO_HS_COMPLETED_LOCAL
+ *
+ * The callback must not block if the handle is already in one of those
+ * states, or has been reused (see pgaio_io_was_recycled()). If, on
+ * return, the state is PGAIO_HS_COMPLETED_IO, state will reach
+ * PGAIO_HS_COMPLETED_SHARED without further intervention by the IO
+ * method.
+ *
* If not provided, it needs to be guaranteed that the IO method calls
* pgaio_io_process_completion() without further interaction by the
* issuing backend.
+ * ---
*/
void (*wait_one) (PgAioHandle *ioh,
uint64 ref_generation);

> > Subject: [PATCH v2.11 03/27] Redefine max_files_per_process to control
> > additionally opened files
>
> Ready to commit

Cool!

> > Subject: [PATCH v2.11 04/27] aio: Add liburing dependency
>
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -944,6 +944,18 @@ endif
> >
> >
> >
> > +###############################################################
> > +# Library: liburing
> > +###############################################################
> > +
> > +liburingopt = get_option('liburing')
> > +liburing = dependency('liburing', required: liburingopt)
> > +if liburing.found()
> > + cdata.set('USE_LIBURING', 1)
> > +endif
>
> This is a different style from other deps; is it equivalent to our standard
> style?

Yes - the only reason to be more complicated in the lz4 case is that we want
to fall back to other ways of looking up the dependency (primarily because of
windows. But that's not required for liburing, which oviously is linux only.

> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -975,6 +975,14 @@ AC_SUBST(with_readline)
> > PGAC_ARG_BOOL(with, libedit-preferred, no,
> > [prefer BSD Libedit over GNU Readline])
> >
> > +#
> > +# liburing
> > +#
> > +AC_MSG_CHECKING([whether to build with liburing support])
> > +PGAC_ARG_BOOL(with, liburing, no, [io_uring support, for asynchronous I/O],
>
> Fourth arg generally starts with "build" for args like this. I suggest "build
> with io_uring support, for asynchronous I/O".

WFM.

> > + [AC_DEFINE([USE_LIBURING], 1, [Define to build with io_uring support. (--with-liburing)])])
> > +AC_MSG_RESULT([$with_liburing])
> > +AC_SUBST(with_liburing)
> >
> > #
> > # UUID library
> > @@ -1463,6 +1471,9 @@ elif test "$with_uuid" = ossp ; then
> > fi
> > AC_SUBST(UUID_LIBS)
> >
> > +if test "$with_liburing" = yes; then
> > + PKG_CHECK_MODULES(LIBURING, liburing)
> > +fi
>
> We usually put this right after the AC_MSG_CHECKING ... AC_SUBST block.

We don't really seem to do that for "dependency checks" in general, e.g.
PGAC_CHECK_PERL_CONFIGS, PGAC_CHECK_PYTHON_EMBED_SETUP, PGAC_CHECK_READLINE,
dependency dependent AC_CHECK_LIB calls, .. later in configure.ac than the
defnition of the option. TBH, I've always struggled trying to discern what
the organizing principle of configure.ac is.

But you're right that the PKG_CHECK_MODULES calls are closer-by. And I'm happy
to move towards having the code for each dep all in one place, so moved.

A related thing: We seem to have no order of the $with_ checks that I can
discern. Should the liburing check be at a different place?

> This currently has unrelated stuff separating them. Also, with the
> exception of icu, we follow PKG_CHECK_MODULES uses by absorbing flags from
> pkg-config and use AC_CHECK_LIB to add the actual "-l".

I think for liburing I was trying to follow ICU's example - injecting CFLAGS
and LIBS just in the parts of the build dir that needs them.

For LIBS I think I did so:

diff --git a/src/backend/Makefile b/src/backend/Makefile
...
+# The backend conditionally needs libraries that most executables don't need.
+LIBS += $(LDAP_LIBS_BE) $(ICU_LIBS) $(LIBURING_LIBS)

But ugh, for some reason I didn't do that for LIBURING_CFLAGS. In the v1.x
version of aio I had
aio:src/backend/storage/aio/Makefile:override CPPFLAGS += $(LIBURING_CFLAGS)

but somehow lost that somewhere along the way to v2.x

I think I like targetting where ${LIB}_LIBS and ${LIB}_CFLAGS are applied more
narrowly better than just adding to the global CFLAGS, CPPFLAGS, LDFLAGS. I'm
somewhat inclined to add it LIBURING_CFLAGS in src/backend rather than
src/backend/storage/aio/ though.

But I'm also willing to do it entirely differently.

> > --- a/doc/src/sgml/installation.sgml
> > +++ b/doc/src/sgml/installation.sgml
>
> lz4 and other deps have a mention in <sect1 id="install-requirements">, in
> addition to sections edited here.

Good point.

Although once more I feel defeated by the ordering used :)

Hm, that list is rather incomplete. At least libxml, libxslt, selinux, curl,
uuid, systemd, selinux and bonjour aren't listed.

Not sure if it makes sense to add liburing, given that?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-03-23 15:15:11 Re: doc patch: wrong descriptions for dropping replication slots
Previous Message Fujii Masao 2025-03-23 15:08:32 Re: Change log level for notifying hot standby is waiting non-overflowed snapshot