Re: AIO v2.2

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: AIO v2.2
Date: 2025-01-07 15:09:58
Message-ID: b7f6bd4a-d794-44d5-b6a7-314395939e2c@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/01/2025 06:03, Andres Freund wrote:
> Hi,
>
> Attached is a new version of the AIO patchset.

I haven't gone through it all yet, but some comments below.

> The biggest changes are:
>
> - The README has been extended with an overview of the API. I think it gives a
> good overview of how the API fits together. I'd be very good to get
> feedback from folks that aren't as familiar with AIO, I can't really see
> what's easy/hard anymore.

Thanks, the README is super helpful! I was overwhelmed by all the new
concepts before, now it all makes much more sense.

Now that it's all laid out more clearly, I see how many different
concepts and states there really are:

- For a single IO, there is an "IO handle", "IO references", and an "IO
return". You first allocate an IO handle (PgAioHandle), and then you get
a reference (PgAioHandleRef) and an "IO return" (PgAioReturn) struct for it.

- An IO handle has eight different states (PgAioHandleState).

I'm sure all those concepts exist for a reason. But still I wonder: can
we simplify?

pgaio_io_get() and pgaio_io_release() are a bit asymmetric, I'd suggest
pgaio_io_acquire() or similar. "get" also feels very innocent, even
though it may wait for previous IO to finish. Especially when
pgaio_io_get_ref() actually is innocent.

> typedef enum PgAioHandleState
> {
> /* not in use */
> AHS_IDLE = 0,
>
> /* returned by pgaio_io_get() */
> AHS_HANDED_OUT,
>
> /* pgaio_io_start_*() has been called, but IO hasn't been submitted yet */
> AHS_DEFINED,
>
> /* subject's prepare() callback has been called */
> AHS_PREPARED,
>
> /* IO has been submitted and is being executed */
> AHS_IN_FLIGHT,
>
> /* IO finished, but result has not yet been processed */
> AHS_REAPED,
>
> /* IO completed, shared completion has been called */
> AHS_COMPLETED_SHARED,
>
> /* IO completed, local completion has been called */
> AHS_COMPLETED_LOCAL,
> } PgAioHandleState;

Do we need to distinguish between DEFINED and PREPARED? At quick glance,
those states are treated the same. (The comment refers to
pgaio_io_start_*() functions, but there's no such thing)

I didn't quite understand the point of the prepare callbacks. For
example, when AsyncReadBuffers() calls smgrstartreadv(), the
shared_buffer_readv_prepare() callback will be called. Why doesn't
AsyncReadBuffers() do the "prepare" work itself directly; why does it
need to be in a callback? I assume it's somehow related to error
handling, but I didn't quite get it. Perhaps an "abort" callback that'd
be called on error, instead of a "prepare" callback, would be better?

There are some synonyms used in the code: I think "in-flight" and
"submitted" mean the same thing. And "prepared" and "staged". I'd
suggest picking just one term for each concept.

I didn't understand the COMPLETED_SHARED and COMPLETED_LOCAL states.
does a single IO go through both states, or are the mutually exclusive?
At quick glance, I don't actually see any code that would set the
COMPLETED_LOCAL state; is it dead code?

REAPED feels like a bad name. It sounds like a later stage than
COMPLETED, but it's actually vice versa.

I'm a little surprised that the term "IO request" isn't used anywhere. I
have no concrete suggestion, but perhaps that would be a useful term.

> - Retries for partial IOs (i.e. short reads) are now implemented. Turned out
> to take all of three lines and adding one missing variable initialization.

:-)

> - There's no obvious way to tell "internal" function operating on an IO handle
> apart from functions that are expected to be called by the issuer of an IO.
>
> One way to deal with this would be to introduce a distinct "issuer IO
> reference" type. I think that might be a good idea, it would also make it
> clearer that a good number of the functions can only be called by the
> issuer, before the IO is submitted.
>
> This would also make it easier to order functions more sensibly in aio.c, as
> all the issuer functions would be together.
>
> The functions on AIO handles that everyone can call already have a distinct
> type (PgAioHandleRef vs PgAioHandle*).

Hmm, yeah I think you might be onto something here.

Could pgaio_io_get() return an PgAioHandleRef directly, so that the
issuer would never see a raw PgAioHandle ?

Finally, attached are a couple of typos and other trivial suggestions.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
aio-typos.patch text/x-patch 4.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-07 15:26:51 Re: Temporary Views Cleanup Issue
Previous Message Andrey M. Borodin 2025-01-07 14:15:22 Re: Sample rate added to pg_stat_statements