Re: AIO v2.2

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: AIO v2.2
Date: 2025-01-07 20:09:56
Message-ID: 2cd4f43e-71fa-4d23-b316-06a0d451f8ef@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/01/2025 18:11, Andres Freund wrote:
> The difference between a handle and a reference is useful right now, to have
> some separation between the functions that can be called by anyone (taking a
> PgAioHandleRef) and only by the issuer (PgAioHandle). That might better be
> solved by having a PgAioHandleIssuerRef ref or something.
>
> Having PgAioReturn be separate from the AIO handle turns out to be rather
> crucial, otherwise it's very hard to guarantee "forward progress",
> i.e. guarantee that pgaio_io_get() will return something without blocking
> forever.

Right, yeah, I can see that.

>>> typedef enum PgAioHandleState
>>> {
>>> /* not in use */
>>> AHS_IDLE = 0,
>>>
>>> /* returned by pgaio_io_get() */
>>> AHS_HANDED_OUT,
>>>
>>> /* pgaio_io_start_*() has been called, but IO hasn't been submitted yet */
>>> AHS_DEFINED,
>>>
>>> /* subject's prepare() callback has been called */
>>> AHS_PREPARED,
>>>
>>> /* IO has been submitted and is being executed */
>>> AHS_IN_FLIGHT,
>>>
>>> /* IO finished, but result has not yet been processed */
>>> AHS_REAPED,
>>>
>>> /* IO completed, shared completion has been called */
>>> AHS_COMPLETED_SHARED,
>>>
>>> /* IO completed, local completion has been called */
>>> AHS_COMPLETED_LOCAL,
>>> } PgAioHandleState;
>>
>> Do we need to distinguish between DEFINED and PREPARED?
>
> I found it to be rather confusing if it's not possible to tell if some action
> (like the prepare callback) has already happened, or not. It's useful to be
> able look at an IO in a backtrace or such and see exactly in what state it is
> in.

I see.

> In v1 I had several of the above states managed as separate boolean variables
> - that turned out to be a huge mess, it's a lot easier to understand if
> there's a single strictly monotonically increasing state.

Agreed on that

>> I didn't quite understand the point of the prepare callbacks. For example,
>> when AsyncReadBuffers() calls smgrstartreadv(), the
>> shared_buffer_readv_prepare() callback will be called. Why doesn't
>> AsyncReadBuffers() do the "prepare" work itself directly; why does it need
>> to be in a callback?
>
> One big part of it is "ownership" - while the IO isn't completely "assembled",
> we can release all buffer pins etc in case of an error. But if the error
> happens just after the IO was staged, we can't - the buffer is still
> referenced by the IO. For that the AIO subystem needs to take its own pins
> etc. Initially the prepare callback didn't exist, the code in
> AsyncReadBuffers() was a lot more complicated before it.
>
>
>> I assume it's somehow related to error handling, but I didn't quite get
>> it. Perhaps an "abort" callback that'd be called on error, instead of a
>> "prepare" callback, would be better?
>
> I don't think an error callback would be helpful - the whole thing is that we
> basically need claim ownership of all IO related resources IFF the IO is
> staged. Not before (because then the IO not getting staged would mean we have
> a resource leak), not after (because we might error out and thus not keep
> e.g. buffers pinned).

Hmm. The comments say that when you call smgrstartreadv(), the IO handle
may no longer be modified, as the IO may be executed immediately. What
if we changed that so that it never submits the IO, only adds the
necessary callbacks to it?

In that world, when smgrstartreadv() returns, the necessary details and
completion callbacks have been set in the IO handle, but the caller can
still do more preparation before the IO is submitted. The caller must
ensure that it gets submitted, however, so no erroring out in that state.

Currently the call stack looks like this:

AsyncReadBuffers()
-> smgrstartreadv()
-> mdstartreadv()
-> FileStartReadV()
-> pgaio_io_prep_readv()
-> shared_buffer_readv_prepare() (callback)
<- (return)
<- (return)
<- (return)
<- (return)
<- (return)

I'm thinking that the prepare work is done "on the way up" instead:

AsyncReadBuffers()
-> smgrstartreadv()
-> mdstartreadv()
-> FileStartReadV()
-> pgaio_io_prep_readv()
<- (return)
<- (return)
<- (return)
-> shared_buffer_readv_prepare()
<- (return)

Attached is a patch to demonstrate concretely what I mean.

This adds a new pgaio_io_stage() step to the issuer, and the issuer
needs to call the prepare functions explicitly, instead of having them
as callbacks. Nominally that's more steps, but IMHO it's better to be
explicit. The same actions were happening previously too, it was just
hidden in the callback. I updated the README to show that too.

I'm not wedded to this, but it feels a little better to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
aio-remove-prepare-callback.patch text/x-patch 17.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-01-07 20:14:15 Re: Statistics Import and Export
Previous Message Robert Haas 2025-01-07 19:59:58 Re: AIO v2.2