Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Subject: Re: AIO v2.5
Date: 2025-03-14 19:43:15
Message-ID: 3yxd5r23zly5bytvgyktbxtxq2r3gbpi7xd4dugevh3h4w4q6c@lu6oatjjpltz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-13 11:53:03 +0100, Antonin Houska wrote:
> Attached are a few proposals for minor comment fixes.

Thanks, applied.

> Besides that, it occurred to me when I was trying to get familiar with the
> patch set (respectable work, btw) that an additional Assert() statement could
> make sense:

Yea, it does. I added it to another place as well.

Attached is v2.8 with the following changes:

- I wasn't happy with the way StartReadBuffers(), WaitReadBuffers() and
AsyncReadBuffers() interacted. The io_method=sync path in particular was too
cute by half, calling WaitReadBuffers() from within WaitReadBuffers().

I think the new state considerably better.

Plenty other smaller changes as part of that. One worth calling out is that
ReadBuffersCanStartIO() now submits staged IO before actually blocking. Not
the prettiest code, but I think it's ok.

- Added a function to assert the sanitiy of a ReadBuffersOperation

While doing the refactoring for the prior point, I temporarily had a bug
that returned buffers for which IO wasn't actually performed. Surprisingly
the only assertion that triggered was when that buffer was read again by
another operation, because it had been marked dirty, despite never being
valid.

Now there's a function that can be used to check that the buffers referenced
by a ReadBuffersOperation are in a sane state.

I guess this could be committed independently, but it'd not be entirely
trivial to extract, so I'm currently leaning against doing that.

- Previously VacuumCostActive accounting happened after IO completion. But
that doesn't seem quite right, it'd allow to submit a lot of IO at
once. It's now moved to AsyncReadBuffers()

- With io_method=sync or with worker and temp tables, smgrstartreadv() would
actually execute the IO. But the time accounting was done entirely around
pgaio_wref_wait(). Now it's done in both places.

- Rebased onto newer version of Thomas' read_stream.c changes

With that newer version the integration with read stream for actually doing
AIO is a bit simpler. There's one FIXME in the patch, because I don't
really understand what a comment is referring to.

I also split out a more experimental patch to make more efficient use of
batching in read stream, the heuristics are more complicated, and it works
well enough without.

- I added a commit to clean up the buffer access accounting for the case that
a buffer was read in concurrently. That IMO is somewhat bogus on master, and
it seemed to get more bogus with AIO.

- Integrated Antonin Houska's fixes and Assert suggestion

- Added a patch to address the smgr.c/md.c interrupt issue (a problem in master), see
https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl

I think the reasonable next steps are:

- Commit "localbuf: *" commits

- Commit temp table tests, likely after lowering the minimum temp_buffers setting

- Pursue a fix of the smgr interupt issue on the referenced thread

This can happen in parallel with AIO patches up to
"aio: Implement support for reads in smgr/md/fd"

- Commit the core AIO infrastructure patch after doing a few more passes

- Commit IO worker support

- In parallel: Find a way to deal with the set_max_safe_fds() issue that we've
been discussing on this thread recently. As that only affects io_uring, it
doesn't have to block other patches going in.

- Do a round of review of the read_stream changes Thomas recently posted (and
that are also included here)

- Try to get some more review for the bufmgr.c related changes. I've whacked
them around a fair bit lately.

- Try to get Thomas to review my read_stream.c changes

Open items:

- The upstream BAS_BULKREAD is so small that throughput is substantially worse
once a table reaches 1/4 shared_buffers. That patch in the patchset as-is is
probably not good enough, although I am not sure about that

- The set_max_safe_fds() issue for io_uring

- Right now effective_io_concurrency cannot be set > 0 on Windows and other
platforms that lack posix_fadvise. But with AIO we can read ahead without
posix_fadvise().

It'd not really make anything worse than today to not remove the limit, but
it'd be pretty weird to prevent windows etc from benefiting from AIO. Need
to look around and see whether it would require anything other than doc
changes.

Greetings,

Andres Freund

Attachment Content-Type Size
v2.8-0001-aio-Basic-subsystem-initialization.patch text/x-diff 25.1 KB
v2.8-0002-aio-Add-core-asynchronous-I-O-infrastructure.patch text/x-diff 89.8 KB
v2.8-0003-aio-Infrastructure-for-io_method-worker.patch text/x-diff 24.9 KB
v2.8-0004-aio-Add-io_method-worker.patch text/x-diff 20.2 KB
v2.8-0005-aio-Add-liburing-dependency.patch text/x-diff 13.4 KB
v2.8-0006-aio-Add-io_method-io_uring.patch text/x-diff 20.2 KB
v2.8-0007-smgr-Hold-interrupts-in-most-smgr-functions.patch text/x-diff 9.4 KB
v2.8-0008-aio-Implement-support-for-reads-in-smgr-md-fd.patch text/x-diff 21.9 KB
v2.8-0009-aio-Add-README.md-explaining-higher-level-desig.patch text/x-diff 19.0 KB
v2.8-0010-aio-Add-pg_aios-view.patch text/x-diff 9.7 KB
v2.8-0011-Improve-read_stream.c-advice-for-big-random-chu.patch text/x-diff 7.6 KB
v2.8-0012-Simplify-distance-heuristics-in-read_stream.c.patch text/x-diff 7.4 KB
v2.8-0013-Support-buffer-forwarding-in-read_stream.c.patch text/x-diff 8.7 KB
v2.8-0014-Support-buffer-forwarding-in-StartReadBuffers.patch text/x-diff 10.5 KB
v2.8-0015-tests-Expand-temp-table-tests-to-some-pin-relat.patch text/x-diff 9.6 KB
v2.8-0016-localbuf-Fix-dangerous-coding-pattern-in-GetLoc.patch text/x-diff 2.9 KB
v2.8-0017-localbuf-Introduce-InvalidateLocalBuffer.patch text/x-diff 6.0 KB
v2.8-0018-localbuf-Introduce-TerminateLocalBufferIO.patch text/x-diff 5.8 KB
v2.8-0019-localbuf-Introduce-FlushLocalBuffer.patch text/x-diff 5.4 KB
v2.8-0020-localbuf-Introduce-StartLocalBufferIO.patch text/x-diff 4.9 KB
v2.8-0021-localbuf-Track-pincount-in-BufferDesc-as-well.patch text/x-diff 4.5 KB
v2.8-0022-bufmgr-Implement-AIO-read-support.patch text/x-diff 25.4 KB
v2.8-0023-bufmgr-Improve-stats-when-buffer-was-read-in-co.patch text/x-diff 3.0 KB
v2.8-0024-bufmgr-Use-AIO-in-StartReadBuffers.patch text/x-diff 26.9 KB
v2.8-0025-docs-Reframe-track_io_timing-related-docs-as-wa.patch text/x-diff 5.3 KB
v2.8-0026-aio-Basic-read_stream-adjustments-for-real-AIO.patch text/x-diff 4.5 KB
v2.8-0027-aio-Experimental-heuristics-to-increase-batchin.patch text/x-diff 1.6 KB
v2.8-0028-aio-Add-test_aio-module.patch text/x-diff 42.4 KB
v2.8-0029-aio-Implement-smgr-md-fd-write-support.patch text/x-diff 12.7 KB
v2.8-0030-aio-Add-bounce-buffers.patch text/x-diff 23.1 KB
v2.8-0031-bufmgr-Implement-AIO-write-support.patch text/x-diff 9.9 KB
v2.8-0032-aio-Add-IO-queue-helper.patch text/x-diff 7.4 KB
v2.8-0033-bufmgr-use-AIO-in-checkpointer-bgwriter.patch text/x-diff 31.1 KB
v2.8-0034-Ensure-a-resowner-exists-for-all-paths-that-may.patch text/x-diff 2.6 KB
v2.8-0035-Temporary-Increase-BAS_BULKREAD-size.patch text/x-diff 1.3 KB
v2.8-0036-WIP-Use-MAP_POPULATE.patch text/x-diff 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-03-14 19:49:13 Re: Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN
Previous Message Álvaro Herrera 2025-03-14 19:38:18 Re: lwlocknames.h beautification attempt