From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Noah Misch <noah(at)leadboat(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-24 01:11:24 |
Message-ID: | CA+hUKGKwV7MccEL+atTwwX2Pazo1h8M_ZChzKKMp7pz258uWow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 24, 2025 at 5:59 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2025-03-23 08:55:29 -0700, Noah Misch wrote:
> > An IO in PGAIO_HS_STAGED clearly blocks closing the IO's FD, and an IO in
> > PGAIO_HS_COMPLETED_IO clearly doesn't block that close. For io_method=worker,
> > closing in PGAIO_HS_SUBMITTED is okay. For io_method=io_uring, is there a
> > reference about it being okay to close during PGAIO_HS_SUBMITTED? I looked
> > awhile for an authoritative view on that, but I didn't find one. If we can
> > rely on io_uring_submit() returning only after the kernel has given the
> > io_uring its own reference to all applicable file descriptors, I expect it's
> > okay to close the process's FD. If the io_uring acquires its reference later
> > than that, I expect we shouldn't close before that later time.
>
> I'm fairly sure io_uring has its own reference for the file descriptor by the
> time io_uring_enter() returns [1]. What io_uring does *not* reliably tolerate
> is the issuing process *exiting* before the IO completes, even if there are
> other processes attached to the same io_uring instance.
It is a bit strange that the documentation doesn't say that
explicitly. You can sorta-maybe-kinda infer it from the fact that
io_uring didn't originally support cancelling requests at all, maybe a
small clue that it also didn't cancel them when you closed the fd :-)
The only sane alternative would seem to be that they keep running and
have their own reference to the *file* (not the fd), which is the
actual case, and might also be inferrable at a stretch from the
io_uring_register() documentation that says it reduces overheads with
a "long term reference" reducing "per-I/O overhead". (The distant
third option/non-option is a sort of late/async binding fd as seen in
the Glibc user space POSIX AIO implementation, but that sort of
madness doesn't seem to be the sort of thing anyone working in the
kernel would entertain for a nanosecond...) Anyway, there are also
public discussions involving Mr Axboe that discuss the fact that async
operations continue to run when the associated fd is closed, eg from
people who were surprised by that when porting stuff from other
systems, which might help fill in the documentation gap a teensy bit
if people want to see something outside the source code:
https://github.com/axboe/liburing/issues/568
> AIO v1 had a posix_aio backend, which, on several platforms, did *not*
> tolerate the FD being closed before the IO completes. Because of that
> IoMethodOps had a closing_fd callback, which posix_aio used to wait for the
> IO's completion [2].
Just for the record while remembering this stuff: Windows is another
system that took the cancel-on-close approach, so the Windows IOCP
proof-of-concept patches also used that AIO v1 callback and we'll have
to think about that again if/when we want to get that stuff
going on AIO v2. I recall also speculating that it might be better to
teach the vfd system to pick another victim to close instead if an fd
was currently tied up with an asynchronous I/O for the benefit of
those cancel-on-close systems, hopefully without any happy-path
book-keeping. But just submitting staged I/O is a nice and cheap
solution for now, without them in the picture.
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2025-03-24 01:21:36 | Re: Add semi-join pushdown to postgres_fdw |
Previous Message | Richard Guo | 2025-03-24 00:59:25 | Re: Fix infinite loop from setting scram_iterations to INT_MAX |