From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-27 01:07:40 |
Message-ID: | o2dtdrlc2xqpjycsfuvv7u6zyrqj7jfxqu32et3lyelkwce2jy@3k3fwdiiu73v |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Attached v2.13, with the following changes:
Changes:
- Pushed a fair number of commits
A lot of thanks goes to Noah's detailed reviews!
- As Noah pointed out, the zero_damaged_pages warning could be emitted in an
io worker or another backend, but omitted in the backend that started the IO
To address that:
1) I added a new commit "aio: Add WARNING result status"
(itself trivial)
2) I changed buffer_readv_complete() to encode the warning/error in a more
detailed way than before (was_zeroed, first_invalid_off, count_invalid)
As part of that I put the encoding/decoding into a static inline
3) Tracking the number of invalid buffers was awkward with
buffer_readv_complete_one() returning a PgAioResult. Now it just
reports whether it found an invalid page with an out argument.
4) As discussed there now is a different error messages for the case of
multiple invalid pages
The code is a bit awkward to avoid code duplication, curious whether
that's seen as acceptable? I could just duplicate the entire ereport()
instead.
5) The WARNING in the callback is now a LOG, as it will be sent to the
client as a WARNING explicitly when the IO's results are processed
I actually chose LOG_SERVER_ONLY - that seemed slightly better than just
LOG? But not at all sure.
There's a comment explaining this now too.
Noah, I think this set of changes would benefit from another round of
review. I left these changes in "squash-later: " commits, to make it easier
to see / think about.
- Added a comment about the pgaio_result_report() in md_readv_complete(). I
changed it to LOG_SERVER_ONLY as well , but I'm not at all sure about that.
- Previously the buffer completion callback checked zero_damaged_pages - but
that's not right, the GUC hopefully is only set on a per-session basis
I solved that by having AsyncReadBuffers() add ZERO_ON_ERROR to the flags if
zero_damaged_pages is configured.
Also added a comment explaining that we probably should eventually use a
separate flag, so we can adjust the errcode etc differently.
- Explicit test for zero_damaged_pages and ZERO_ON_ERROR
As part of that I made read_rel_block_ll() support reading multiple
blocks. That makes it a lot easier to verify that we handle cases like a
4-block read where 2,3 are invalid correctly.
- I removed the code that "localbuf: Track pincount in BufferDesc as well"
added to ConditionalLockBufferForCleanup() and IsBufferCleanupOK() as discussed
Right now the situations that the code was worried don't exist yet, as we
only support reads.
I added a comment about not needing to worry about that yet to "bufmgr:
Implement AIO read support". And then changed that comment to a FIXME in the
write patches.
- Squashed Thomas' change to make io_concurrency=0 really not use AIO
- Lots of other review comments by Noah addressed
- Merged typo fixes by Thom Brown
TODO:
- There are more tests in test_aio that should be expanded to run for temp
tables as well, not just normal tables
- Add an explicit test for the checksum verification in the completion callback
There is an existing test for testing an invalid page due to page header
verification in test_aio, but not for checksum failures.
I think it's indirectly covered (e.g. in amcheck), but seems better to test
it explicitly.
- Add error context callbacks for io worker and "foreign" IO completion
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-03-27 01:21:39 | Re: A small correction to doc and comment of FSM for indexes |
Previous Message | David Rowley | 2025-03-27 00:58:56 | Re: Partition pruning on parameters grouped into an array does not prune properly |