Re: AIO v2.5

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-23 00:20:56
Message-ID: 20250323002056.95.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> Attached v2.11, with the following changes:

> - Added an error check for FileStartReadV() failing
>
> FileStartReadV() actually can fail, if the file can't be re-opened. I
> thought it'd be important for the error message to differ from the one
> that's issued for read actually failing, so I went with:
>
> "could not start reading blocks %u..%u in file \"%s\": %m"
>
> but I'm not sure how good that is.

Message looks good.

> - Improved error message if io_uring_queue_init() fails
>
> Added errhint()s for likely cases of failure.
>
> Added errcode(). I was tempted to use errcode_for_file_access(), but that
> doesn't support ENOSYS - perhaps I should add that instead?

Either way is fine with me. ENOSYS -> ERRCODE_FEATURE_NOT_SUPPORTED is a good
general mapping to have in errcode_for_file_access(), but it's also not a
problem to keep it the way v2.11 has it.

> - Disable io_uring method when using EXEC_BACKEND, they're not compatible
>
> I chose to do this with a define aio.h, but I guess we could also do it at
> configure time? That seems more complicated though - how would we even know
> that EXEC_BACKEND is used on non-windows?

Agreed, "make PROFILE=-DEXEC_BACKEND" is a valid way to get EXEC_BACKEND.

> Not sure yet how to best disable testing io_uring in this case. We can't
> just query EXEC_BACKEND from pg_config.h unfortunately. I guess making the
> initdb not fail and checking the error log would work, but that doesn't work
> nicely with Cluster.pm.

How about "postgres -c io_method=io_uring -C <anything>":

--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -29,7 +29,13 @@ $node_worker->stop();
# Test io_method=io_uring
###

-if ($ENV{with_liburing} eq 'yes')
+sub have_io_uring
+{
+ local %ENV = $node_worker->_get_env(); # any node works
+ return run_log [qw(postgres -c io_method=io_uring -C io_method)];
+}
+
+if (have_io_uring())
{
my $node_uring = create_node('io_uring');
$node_uring->start();

> Questions:
>
>
> - We only "look" at BM_IO_ERROR for writes, isn't that somewhat weird?
>
> See AbortBufferIO(Buffer buffer)
>
> It doesn't really matter for the patchset, but it just strikes me as an oddity.

That caught my attention in an earlier review round, but I didn't find it
important enough to raise. It's mildly unfortunate to be setting BM_IO_ERROR
for reads when the only thing BM_IO_ERROR drives is message "Multiple failures
--- write error might be permanent." It's minor, so let's leave it that way
for the foreseeable future.

> Subject: [PATCH v2.11 01/27] aio, bufmgr: Comment fixes

Ready to commit, though other comment fixes might come up in later reviews.
One idea so far is to comment on valid states after some IoMethodOps
callbacks:

--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -310,6 +310,9 @@ typedef struct IoMethodOps
/*
* Start executing passed in IOs.
*
+ * Shall advance state to PGAIO_HS_SUBMITTED. (By the time this returns,
+ * other backends might have advanced the state further.)
+ *
* Will not be called if ->needs_synchronous_execution() returned true.
*
* num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE.
@@ -321,6 +324,12 @@ typedef struct IoMethodOps
/*
* Wait for the IO to complete. Optional.
*
+ * On return, state shall be PGAIO_HS_COMPLETED_IO,
+ * PGAIO_HS_COMPLETED_SHARED or PGAIO_HS_COMPLETED_LOCAL. (The callback
+ * need not change the state if it's already one of those.) If state is
+ * PGAIO_HS_COMPLETED_IO, state will reach PGAIO_HS_COMPLETED_SHARED
+ * without further intervention.
+ *
* If not provided, it needs to be guaranteed that the IO method calls
* pgaio_io_process_completion() without further interaction by the
* issuing backend.

> Subject: [PATCH v2.11 02/27] aio: Change prefix of PgAioResultStatus values to
> PGAIO_RS_

Ready to commit

> Subject: [PATCH v2.11 03/27] Redefine max_files_per_process to control
> additionally opened files

Ready to commit

> Subject: [PATCH v2.11 04/27] aio: Add liburing dependency

> --- a/meson.build
> +++ b/meson.build
> @@ -944,6 +944,18 @@ endif
>
>
>
> +###############################################################
> +# Library: liburing
> +###############################################################
> +
> +liburingopt = get_option('liburing')
> +liburing = dependency('liburing', required: liburingopt)
> +if liburing.found()
> + cdata.set('USE_LIBURING', 1)
> +endif

This is a different style from other deps; is it equivalent to our standard
style? Example for lz4:

lz4opt = get_option('lz4')
if not lz4opt.disabled()
lz4 = dependency('liblz4', required: false)
# Unfortunately the dependency is named differently with cmake
if not lz4.found() # combine with above once meson 0.60.0 is required
lz4 = dependency('lz4', required: lz4opt,
method: 'cmake', modules: ['LZ4::lz4_shared'],
)
endif

if lz4.found()
cdata.set('USE_LZ4', 1)
cdata.set('HAVE_LIBLZ4', 1)
endif

else
lz4 = not_found_dep
endif

> --- a/configure.ac
> +++ b/configure.ac
> @@ -975,6 +975,14 @@ AC_SUBST(with_readline)
> PGAC_ARG_BOOL(with, libedit-preferred, no,
> [prefer BSD Libedit over GNU Readline])
>
> +#
> +# liburing
> +#
> +AC_MSG_CHECKING([whether to build with liburing support])
> +PGAC_ARG_BOOL(with, liburing, no, [io_uring support, for asynchronous I/O],

Fourth arg generally starts with "build" for args like this. I suggest "build
with io_uring support, for asynchronous I/O". Comparable options:

--with-llvm build with LLVM based JIT support
--with-tcl build Tcl modules (PL/Tcl)
--with-perl build Perl modules (PL/Perl)
--with-python build Python modules (PL/Python)
--with-gssapi build with GSSAPI support
--with-pam build with PAM support
--with-bsd-auth build with BSD Authentication support
--with-ldap build with LDAP support
--with-bonjour build with Bonjour support
--with-selinux build with SELinux support
--with-systemd build with systemd support
--with-libcurl build with libcurl support
--with-libxml build with XML support
--with-libxslt use XSLT support when building contrib/xml2
--with-lz4 build with LZ4 support
--with-zstd build with ZSTD support

> + [AC_DEFINE([USE_LIBURING], 1, [Define to build with io_uring support. (--with-liburing)])])
> +AC_MSG_RESULT([$with_liburing])
> +AC_SUBST(with_liburing)
>
> #
> # UUID library
> @@ -1463,6 +1471,9 @@ elif test "$with_uuid" = ossp ; then
> fi
> AC_SUBST(UUID_LIBS)
>
> +if test "$with_liburing" = yes; then
> + PKG_CHECK_MODULES(LIBURING, liburing)
> +fi

We usually put this right after the AC_MSG_CHECKING ... AC_SUBST block. This
currently has unrelated stuff separating them. Also, with the exception of
icu, we follow PKG_CHECK_MODULES uses by absorbing flags from pkg-config and
use AC_CHECK_LIB to add the actual "-l". By not absorbing flags, I think a
liburing in a nonstandard location would require --with-libraries and
--with-includes, unlike the other PKG_CHECK_MODULES-based dependencies. lz4
is a representative example of our standard:

```
AC_MSG_CHECKING([whether to build with LZ4 support])
PGAC_ARG_BOOL(with, lz4, no, [build with LZ4 support],
[AC_DEFINE([USE_LZ4], 1, [Define to 1 to build with LZ4 support. (--with-lz4)])])
AC_MSG_RESULT([$with_lz4])
AC_SUBST(with_lz4)

if test "$with_lz4" = yes; then
PKG_CHECK_MODULES(LZ4, liblz4)
# We only care about -I, -D, and -L switches;
# note that -llz4 will be added by AC_CHECK_LIB below.
for pgac_option in $LZ4_CFLAGS; do
case $pgac_option in
-I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
esac
done
for pgac_option in $LZ4_LIBS; do
case $pgac_option in
-L*) LDFLAGS="$LDFLAGS $pgac_option";;
esac
done
fi

# ... later in file ...

if test "$with_lz4" = yes ; then
AC_CHECK_LIB(lz4, LZ4_compress_default, [], [AC_MSG_ERROR([library 'lz4' is required for LZ4 support])])
fi
```

I think it's okay to not use the AC_CHECK_LIB and rely on explicit
src/backend/Makefile code like you've done, but we shouldn't miss
CPPFLAGS/LDFLAGS (or should have a comment on why missing them is right).

> --- a/doc/src/sgml/installation.sgml
> +++ b/doc/src/sgml/installation.sgml

lz4 and other deps have a mention in <sect1 id="install-requirements">, in
addition to sections edited here.

> Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring

(Still reviewing this one.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-03-23 00:47:31 Re: query_id: jumble names of temp tables for better pg_stat_statement UX
Previous Message Michael Paquier 2025-03-23 00:01:14 Re: Proposal - Allow extensions to set a Plan Identifier