From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-03-29 14:48:10 |
Message-ID: | d4ff3qxudxn2ze3zk2xwhds5ysezfei3uqeuimymj624t5chsd@sgr2h3rmln5t |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-29 06:41:43 -0700, Noah Misch wrote:
> On Fri, Mar 28, 2025 at 11:35:23PM -0400, Andres Freund wrote:
> > But I finally got to a point where the code ends up readable, without undue
> > duplication. It would, leaving some nasty hack aside, require a
> > errhint_internal() - but I can't imagine a reason against introducing that,
> > given we have it for the errmsg and errhint.
>
> Introducing that is fine.
Cool.
> > Here's the relevant code:
> >
> > /*
> > * Treat a read that had both zeroed buffers *and* ignored checksums as a
> > * special case, it's too irregular to be emitted the same way as the other
> > * cases.
> > */
> > if (zeroed_any && ignored_any)
> > {
> > Assert(zeroed_any && ignored_any);
> > Assert(nblocks > 1); /* same block can't be both zeroed and ignored */
> > Assert(result.status != PGAIO_RS_ERROR);
> > affected_count = zeroed_or_error_count;
> >
> > ereport(elevel,
> > errcode(ERRCODE_DATA_CORRUPTED),
> > errmsg("zeroing %u pages and ignoring %u checksum failures among blocks %u..%u of relation %s",
> > affected_count, checkfail_count, first, last, rpath.str),
>
> Translation stumbles on this one, because each of the first two %u are
> plural-sensitive.
Fair. We don't generally seem to have been very careful around this in relate
code, but there's no reason to just continue down that road when it's easy.
E.g. in md.c we unconditionally output "could not read blocks %u..%u in file \"%s\": %m"
even if it's just a single block...
> I'd do one of:
>
> - Call ereport() twice, once for zeroed pages and once for ignored checksums.
> Since elevel <= ERROR here, that doesn't lose the second call.
>
> - s/pages/page(s)/ like msgid "There are %d other session(s) and %d prepared
> transaction(s) using the database."
I think I like this better.
> > /*
> > * The other messages are highly repetitive. To avoid duplicating a long
> > * and complicated ereport(), gather the translated format strings
> > * separately and then do one common ereport.
> > */
> > if (result.status == PGAIO_RS_ERROR)
> > {
> > Assert(!zeroed_any); /* can't have invalid pages when zeroing them */
> > affected_count = zeroed_or_error_count;
> > msg_one = _("invalid page in block %u of relation %s");
> > msg_mult = _("%u invalid pages among blocks %u..%u of relation %s");
> > det_mult = _("Block %u held first invalid page.");
> > hint_mult = _("See server log for the other %u invalid blocks.");
>
> For each hint_mult, we would usually use ngettext() instead of _(). (Would be
> errhint_plural() if not separated from its ereport().) Alternatively,
> s/blocks/block(s)/ is fine.
I will go with the (s) here as well, this stuff is too rare to be worth having
pluralized messages imo.
> > Does that approach make sense?
>
> Yes.
> ...
> > I like the replacement. It moves the important part to the front, and it's
> shorter.
Cool, I squashed them with the relevant changes now.
Attached is v2.14:
Changes:
- Added a commit to fix stats attribution of checksum errors, previously the
checksum errors detected bufmgr.c/storage.c were always attributed to the
current database
This would have caused bigger issues with worker based IO, as IO workers
aren't connected to databases.
- Added a commit to allow checksum error reports to happen in critical
sections. For that a pgstat_prepare_report_checksum_failure() has to be
called in the same backend, to report the critical section.
Other suggestions for the name welcome.
- Expanded on the idea in 13 to track the number of invalid buffers in the
IO's result, by also tracking checksum errors. Combined with the previous
point, this fixes the issue of an assert during checksum failure reporting
outlined in:
https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga%40ptwkinxqo3az
This required being a bit more careful with space in the error, to be able
to squeeze in the checksum number.
- The ignore_checksum_failure of the issuer needs to be used when completing
IO, not the one of the completor, particularly when using io_method=worker
For that the access to ignore_checksum_failure had to be moved from
PageIsVerified() to its callers.
I added tests for ignore_checksum_failure, including its interplay with
zero_damaged_pages.
- Deduplicated the error reporting in buffer_readv_report() somewhat by only
having the selection of format strings be done in branches. I think this
ends up a lot more readable than the huge ereport before.
- Added details about the changed error/warning logging to "bufmgr: Implement
AIO read support"'s commit message.
- polished the commit to add PGAIO_RS_WARNING a bit, adding defines for the
bit-widths of PgAioResult portions and added static asserts to verify them
- Squashed the changes that I had kept separately in v2.13, it was too hard to
do that while doing the above changes.
I did make the encoding function cast the arguments to uint32 before
shifting. I think that's implied by the C integer promotion rules, but it
seemed fishy enough to not want to believe in that.
I also added a StaticAssertStmt() to ensure we are only using the available
bit space.
- Added a test for a) checksum errors being detected b) CREATE DATABASE
... STRATEGY WAL_LOG
The latter is interesting because it also provides test coverage for doing
IO for objects in other databases.
- Removed an obsoleted inclusion of pg_trace.h in localbuf.c
TODO:
- I think the tests around zero_damaged_pages, ignore_checksum_failure should
be expanded a bit more. There's two FIXME in the tests about that.
At the moment there are two different test functions for zero_damaged_pages
and ignore_checksum_failure, I'm not sure how good that is.
I wanted to get this version out, because I have to run some errands,
otherwise I'd have implemente them first...
Next steps:
- push the checksums stats fix
- unless somebody sees a reason to not use LOG_SERVER_ONLY in
"aio: Implement support for reads in smgr/md/fd", push that
Besides that the only change since Noah's last review of that commit is an
added comment.
- push acronym, glossary change
- push pg_aios view (depends a tiny bit on the smgr/md/fd change above)
- push "localbuf: Track pincount in BufferDesc as well" - I think I addressed
all of Noah's review feedback
- address the above TODO
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-03-29 15:15:47 | Why does wait_for_log() return current file size |
Previous Message | David G. Johnston | 2025-03-29 14:40:55 | Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query). |