From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-04-01 13:45:23 |
Message-ID: | 3v4fjz62nyvwrulvjgwawqnv2eymdfnbki52lnvu6furoqhrqx@jrpyzttnfh2z |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-04-01 14:56:17 +0300, Aleksander Alekseev wrote:
> Hi Andres,
>
> > > I didn't yet push
> > >
> > > > > Subject: [PATCH v2.14 13/29] aio: Add README.md explaining higher level design
>
> I have several notes about 0003 / README.md:
>
> 1. I noticed that the use of "Postgres" and "postgres" is inconsistent.
:/
It probably should be consistent, but I have no idea which of the spellings we
should go for. Either looks ugly in some contexts.
> 2.
>
> ```
> +pgaio_io_register_callbacks(ioh, PGAIO_HCB_SHARED_BUFFER_READV, 0);
> ```
>
> Perhaps I'm a bit late here, but the name of the function is weird. It
> registers a single callback, but the name is "_callbacks".
It registers a *set* of callbacks (stage, complete_shared, complete_local,
report_error) on the handle.
> 3. The use of "AIO Handle" and "AioHandle" is inconsistent.
This seems ok to me.
> 4.
>
> - pgaio_io_register_callbacks
> - pgaio_io_set_handle_data_32
>
> If I understand correctly one can register multiple callbacks per one
> AIO Handle (right? ...). However I don't see an obvious way to match
> handle data to the given callback. If all the callbacks get the same
> handle data... well it's weird IMO, but we should explicitly say that.
There is:
/*
* Associate an array of data with the Handle. This is e.g. useful to the
* transport knowledge about which buffers a multi-block IO affects to
* completion callbacks.
*
* Right now this can be done only once for each IO, even though multiple
* callbacks can be registered. There aren't any known usecases requiring more
* and the required amount of shared memory does add up, so it doesn't seem
* worth multiplying memory usage by PGAIO_HANDLE_MAX_CALLBACKS.
*/
> On top of that we should probably explain in which order the callbacks
> are going to be executed. If there are any guarantees in this respect
> of course.
Alongside PgAioHandleCallbacks:
*
* The latest registered callback is called first. This allows
* higher-level code to register callbacks that can rely on callbacks
* registered by lower-level code to already have been executed.
*
> 5. pgaio_io_set_handle_data_32(ioh, (uint32 *) buffer, 1)
>
> Perhaps it's worth mentioning if `buffer` can be freed after the call
> i.e. if it's stored by value or by reference.
By value.
> It's also worth clarifying if the maximum number of buffers is limited or
> not.
It's limited to PG_IOV_MAX, fwiw.
> 6. It is worth clarifying if AIO allows reads and writes or only reads
> at the moment.
We have patches for writes, I just ran out of time for 18. Im not particularly
excited about adding stuff that then needs to be removed in 19.
> Perhaps it's also worth explicitly saying that AIO is for disk IO only, not
> for network one.
I'd like to integrate network IO too. I have a local prototype, fwiw.
> 7. It is worth clarifying how many times the callbacks are called when
> reading multiple buffers. Is it guaranteed that the callbacks are
> called ones, or if it somehow depends on the implementation, and also
> what happens in the case if I/O succeeds partially.
The aio subsystem doesn't know anything about buffers. Callbacks are executed
exactly once, with the exception of the error reporting callback, which could
be called multiple times.
> 8. I believe we should tell a bit more about the context in which the
> callbacks are called. Particularly what happens to the memory contexts
> and if I can allocate/free memory, can I throw ERRORs, can I create
> new AIO Handles, is it expected that the callback should return
> quickly, are the signals masked while the callback is executed, can I
> use sockets, is it guaranteed that the callback is going to be called
> in the same process (I guess so, but the text doesn't explicitly
> promise that), etc.
There is the following above pgaio_io_register_callbacks()
* Note that callbacks are executed in critical sections. This is necessary
* to be able to execute IO in critical sections (consider e.g. WAL
* logging). To perform AIO we first need to acquire a handle, which, if there
* are no free handles, requires waiting for IOs to complete and to execute
* their completion callbacks.
*
* Callbacks may be executed in the issuing backend but also in another
* backend (because that backend is waiting for the IO) or in IO workers (if
* io_method=worker is used).
And also a bunch of detail along struct PgAioHandleCallbacks.
> 9.
>
> ```
> +Because acquisition of an IO handle
> +[must always succeed](#io-can-be-started-in-critical-sections)
> +and the number of AIO Handles
> +[has to be limited](#state-for-aio-needs-to-live-in-shared-memory)
> +AIO handles can be reused as soon as they have completed.
> ```
>
> What pgaio_io_acquire() does if we are out of AIO Handles? Since it
> always succeeds I guess it should block the caller in this case, but
> IMO we should say this explicitly.
That's documented above pgaio_io_acquire().
> 10.
>
> > > because I want to integrate some language that could be referenced by
> > > smgrstartreadv() (and more in the future), as we have been talking about.
> >
> > I tried a bunch of variations and none of them seemed great. So I ended up
> > with a lightly polished version of your suggested comment above
> > smgrstartreadv(). We can later see about generalizing it.
>
> IMO the problem here is that README doesn't show the code that does IO
> per se, and thus doesn't give the full picture of how AIO should be
> used. Perhaps instead of referencing smgrstartreadv() it would be
> better to provide a simple but complete example, one that opens a
> binary file and reads 512 bytes from it by the given offset for
> instance.
IMO the example is already long enough, if you want that level of detail, you
can just look at smgrstartreadv() etc. The idea about explaining it at that
level is that that is basically what is required to use AIO in a new place,
whereas implementing AIO for a new target, or a new IO operation, requires a
bit more care.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-04-01 13:47:59 | Re: pgsql: Add support for OAUTHBEARER SASL mechanism |
Previous Message | torikoshia | 2025-04-01 13:38:23 | Re: Proposal: Progressive explain |