Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-25 18:58:37
Message-ID: 5ons2rtmwarqqhhexb3dnqulw5rjgwgoct57vpdau4rujlrffj@3fls6d2mkiwc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-25 08:58:08 -0700, Noah Misch wrote:
> While having nagging thoughts that we might be releasing FDs before io_uring
> gets them into kernel custody, I tried this hack to maximize FD turnover:
>
> static void
> ReleaseLruFiles(void)
> {
> #if 0
> while (nfile + numAllocatedDescs + numExternalFDs >= max_safe_fds)
> {
> if (!ReleaseLruFile())
> break;
> }
> #else
> while (ReleaseLruFile())
> ;
> #endif
> }
>
> "make check" with default settings (io_method=worker) passes, but
> io_method=io_uring in the TEMP_CONFIG file got different diffs in each of two
> runs. s/#if 0/#if 1/ (restore normal FD turnover) removes the failures.
> Here's the richer of the two diffs:

Yikes. That's a very good catch.

I spent a bit of time debugging this. I think I see what's going on - it turns
out that the kernel does *not* open the FDs during io_uring_enter() if
IOSQE_ASYNC is specified [1]. Which we do add heuristically, in an attempt to
avoid a small but measurable slowdown for sequential scans that are fully
buffered (c.f. pgaio_uring_submit()). If I disable that heuristic, your patch
above passes all tests here.

I don't know if that's an intentional or unintentional behavioral difference.

There are 2 1/2 ways around this:

1) Stop using IOSQE_ASYNC heuristic
2a) Wait for all in-flight IOs when any FD gets closed
2b) Wait for all in-flight IOs using FD when it gets closed

Given that we have clear evidence that io_uring doesn't completely support
closing FDs while IOs are in flight, be it a bug or intentional, it seems
clearly better to go for 2a or 2b.

Greetings,

Andres Freund

[1] Instead files are opened when the queue entry is being worked on
instead. Interestingly that only happens when the IO is *explicitly*
requested to be executed in the workqueue with IOSQE_ASYNC, not when it's
put there because it couldn't be done in a non-blocking way.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-03-25 19:08:38 Re: Allow default \watch interval in psql to be configured
Previous Message Robert Haas 2025-03-25 18:47:52 Re: why there is not VACUUM FULL CONCURRENTLY?