From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: AIO v2.2 |
Date: | 2025-01-09 00:26:36 |
Message-ID: | fsauvxs3xzgqsowpu4cyon5pj4nwzfejbazsd5aqbd5t3qxi6p@fklsi6bpmniw |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-01-07 14:59:58 -0500, Robert Haas wrote:
> On Tue, Jan 7, 2025 at 11:11 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The difference between a handle and a reference is useful right now, to have
> > some separation between the functions that can be called by anyone (taking a
> > PgAioHandleRef) and only by the issuer (PgAioHandle). That might better be
> > solved by having a PgAioHandleIssuerRef ref or something.
>
> To me, those names don't convey that.
I'm certainly not wedded to these names - I went back and forth between
different names a fair bit, because I wasn't quite happy. I am however certain
that the current names are better than what it used to be (PgAioInProgress and
because that's long, a bunch of PgAioIP* names) :)
To make sure were talking about the same things, I am thinking of the
following "entities" needing names:
1) Shared memory representation of an IO, for the AIO subsystem internally
Currently: PgAioHandle
Because shared memory is limited, we need to reuse this entity. This reuse
needs to be possible "immediately" after completion, to avoid a bunch of
nasty scenarios.
To distinguish a reused PgAioHandle from its "prior" incarnation, each
PgAioHandle has a 64bit "generation counter.
In addition to being referenceable via pointer, it's also possible to
assign a 32bit integer to each PgAioHandle, as there is a fixed number of
them.
2) A way for the issuer of an IO to reference 1), to attach information to the
IO
Currently: PgAioHandle*
As long as the issuer hasn't yet staged the IO, it can't be
reused. Therefore it's OK to just point to the PgAioHandle.
One disadvantage of just using a pointer to PgAioHandle* is that it's
harder to distinguish subystem-internal functions that accept PgAioHandle*
from "public" functions that accept the "issuer reference".
3) A way for any backend to wait for a specific IO to complete
Currently: PgAioHandleRef
This references 1) using a 32 bit ID and the 64bit generation.
This is used to allow any backend to wait for a specific IO to
complete. E.g. by including it in the BufferDesc so that WaitIO can wait
for it.
Because it includes the generation it's trivial to detect whether the
PgAioHandle was reused.
> I would perhaps call the thing that supports issuer-only operations a
> "PgAio" and the thing other people can use a "PgAioHandle". Or
> "PgAioRequest" and "PgAioHandle" or something like that. With
> PgAioHandleRef, IMHO you've got two words that both imply a layer of
> indirection -- "handle" and "ref" -- which doesn't seem quite as nice,
> because then the other thing -- "PgAioHandle" still sort of implies one
> layer of indirection and the whole thing seems a bit less clear.
It's indirections all the way down. The PG representation of "one IO" in the
end is just an indirection for a kernel operation :)
I would like to split 1) and 2) above.
1) PgAio{Handle,Request,} (a large struct) - used internally by AIO subsystem,
"pointed to" by the following
2) PgAioIssuerRef (an ID or pointer) - used by the issuer to incrementally
define the IO
3) PgAioWaitRef - (an ID and generation) - used to wait for a specific IO to
complete, not affected by reuse of PgAioHandle
> > > REAPED feels like a bad name. It sounds like a later stage than COMPLETED,
> > > but it's actually vice versa.
> >
> > What would you call having gotten "completion notifications" from the kernel,
> > but not having processed them?
>
> The Linux kernel calls those zombie processes, so we could call it a ZOMBIE
> state, but that seems like it might be a bit of inside baseball.
ZOMBIE feels even later than REAPED to me :)
> I do agree with Heikki that REAPED sounds later than COMPLETED, because you
> reap zombie processes by collecting their exit status. Maybe you could have
> AHS_COMPLETE or AHS_IO_COMPLETE for the state where the I/O is done but
> there's still completion-related work to be done, and then the other state
> could be AHS_DONE or AHS_FINISHED or AHS_FINAL or AHS_REAPED or something.
How about
AHS_COMPLETE_KERNEL or AHS_COMPLETE_RAW - raw syscall completed
AHS_COMPLETE_SHARED_CB - shared callback completed
AHS_COMPLETE_LOCAL_CB - local callback completed
?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-01-09 01:17:30 | Re: Adding support for SSLKEYLOGFILE in the frontend |
Previous Message | Robert Treat | 2025-01-09 00:01:53 | Re: New GUC autovacuum_max_threshold ? |