Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-03-27 01:07:40
Message-ID: o2dtdrlc2xqpjycsfuvv7u6zyrqj7jfxqu32et3lyelkwce2jy@3k3fwdiiu73v
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached v2.13, with the following changes:
Changes:

- Pushed a fair number of commits

A lot of thanks goes to Noah's detailed reviews!

- As Noah pointed out, the zero_damaged_pages warning could be emitted in an
io worker or another backend, but omitted in the backend that started the IO

To address that:

1) I added a new commit "aio: Add WARNING result status"
(itself trivial)

2) I changed buffer_readv_complete() to encode the warning/error in a more
detailed way than before (was_zeroed, first_invalid_off, count_invalid)

As part of that I put the encoding/decoding into a static inline

3) Tracking the number of invalid buffers was awkward with
buffer_readv_complete_one() returning a PgAioResult. Now it just
reports whether it found an invalid page with an out argument.

4) As discussed there now is a different error messages for the case of
multiple invalid pages

The code is a bit awkward to avoid code duplication, curious whether
that's seen as acceptable? I could just duplicate the entire ereport()
instead.

5) The WARNING in the callback is now a LOG, as it will be sent to the
client as a WARNING explicitly when the IO's results are processed

I actually chose LOG_SERVER_ONLY - that seemed slightly better than just
LOG? But not at all sure.

There's a comment explaining this now too.

Noah, I think this set of changes would benefit from another round of
review. I left these changes in "squash-later: " commits, to make it easier
to see / think about.

- Added a comment about the pgaio_result_report() in md_readv_complete(). I
changed it to LOG_SERVER_ONLY as well , but I'm not at all sure about that.

- Previously the buffer completion callback checked zero_damaged_pages - but
that's not right, the GUC hopefully is only set on a per-session basis

I solved that by having AsyncReadBuffers() add ZERO_ON_ERROR to the flags if
zero_damaged_pages is configured.

Also added a comment explaining that we probably should eventually use a
separate flag, so we can adjust the errcode etc differently.

- Explicit test for zero_damaged_pages and ZERO_ON_ERROR

As part of that I made read_rel_block_ll() support reading multiple
blocks. That makes it a lot easier to verify that we handle cases like a
4-block read where 2,3 are invalid correctly.

- I removed the code that "localbuf: Track pincount in BufferDesc as well"
added to ConditionalLockBufferForCleanup() and IsBufferCleanupOK() as discussed

Right now the situations that the code was worried don't exist yet, as we
only support reads.

I added a comment about not needing to worry about that yet to "bufmgr:
Implement AIO read support". And then changed that comment to a FIXME in the
write patches.

- Squashed Thomas' change to make io_concurrency=0 really not use AIO

- Lots of other review comments by Noah addressed

- Merged typo fixes by Thom Brown

TODO:

- There are more tests in test_aio that should be expanded to run for temp
tables as well, not just normal tables

- Add an explicit test for the checksum verification in the completion callback

There is an existing test for testing an invalid page due to page header
verification in test_aio, but not for checksum failures.

I think it's indirectly covered (e.g. in amcheck), but seems better to test
it explicitly.

- Add error context callbacks for io worker and "foreign" IO completion

Greetings,

Andres Freund

Attachment Content-Type Size
v2.13-0001-aio-Implement-support-for-reads-in-smgr-md-fd.patch text/x-diff 23.6 KB
v2.13-0002-docs-Add-acronym-and-glossary-entries-for-I-O-.patch text/x-diff 3.5 KB
v2.13-0003-aio-Add-pg_aios-view.patch text/x-diff 19.4 KB
v2.13-0004-localbuf-Track-pincount-in-BufferDesc-as-well.patch text/x-diff 3.3 KB
v2.13-0005-aio-bufmgr-Comment-fixes.patch text/x-diff 4.8 KB
v2.13-0006-aio-Add-WARNING-result-status.patch text/x-diff 2.2 KB
v2.13-0007-bufmgr-Implement-AIO-read-support.patch text/x-diff 32.9 KB
v2.13-0008-squash-later-bufmgr-Implement-AIO-read-support.patch text/x-diff 9.9 KB
v2.13-0009-bufmgr-Use-AIO-in-StartReadBuffers.patch text/x-diff 26.3 KB
v2.13-0010-squash-later-bufmgr-Use-AIO-in-StartReadBuffer.patch text/x-diff 2.8 KB
v2.13-0011-aio-Add-README.md-explaining-higher-level-desi.patch text/x-diff 19.0 KB
v2.13-0012-squash-later-aio-Add-README.md-explaining-high.patch text/x-diff 1.7 KB
v2.13-0013-aio-Basic-read_stream-adjustments-for-real-AIO.patch text/x-diff 4.9 KB
v2.13-0014-read_stream-Introduce-and-use-optional-batchmo.patch text/x-diff 13.1 KB
v2.13-0015-docs-Reframe-track_io_timing-related-docs-as-w.patch text/x-diff 5.3 KB
v2.13-0016-Enable-IO-concurrency-on-all-systems.patch text/x-diff 11.1 KB
v2.13-0017-aio-Add-test_aio-module.patch text/x-diff 62.7 KB
v2.13-0018-aio-Experimental-heuristics-to-increase-batchi.patch text/x-diff 1.6 KB
v2.13-0019-aio-Implement-smgr-md-fd-write-support.patch text/x-diff 14.6 KB
v2.13-0020-aio-Add-bounce-buffers.patch text/x-diff 23.2 KB
v2.13-0021-bufmgr-Implement-AIO-write-support.patch text/x-diff 11.6 KB
v2.13-0022-aio-Add-IO-queue-helper.patch text/x-diff 7.4 KB
v2.13-0023-bufmgr-use-AIO-in-checkpointer-bgwriter.patch text/x-diff 31.6 KB
v2.13-0024-Ensure-a-resowner-exists-for-all-paths-that-ma.patch text/x-diff 2.6 KB
v2.13-0025-Temporary-Increase-BAS_BULKREAD-size.patch text/x-diff 1.3 KB
v2.13-0026-WIP-Use-MAP_POPULATE.patch text/x-diff 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-03-27 01:21:39 Re: A small correction to doc and comment of FSM for indexes
Previous Message David Rowley 2025-03-27 00:58:56 Re: Partition pruning on parameters grouped into an array does not prune properly