Quick Links

Re: AIO v2.5

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject:	Re: AIO v2.5
Date:	2025-04-01 16:07:27
Message-ID:	20250401160727.bd.nmisch@google.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 01, 2025 at 11:55:20AM -0400, Andres Freund wrote:
> On 2025-04-01 08:11:59 -0700, Noah Misch wrote:
> > On Mon, Mar 31, 2025 at 08:41:39PM -0400, Andres Freund wrote:

> I haven't yet pushed the changes, but will work on that in the afternoon.
>
> I plan to afterwards close the CF entry and will eventually create a new one
> for write support, although probably only rebasing onto
> https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf%40gcnactj4z56m
> and addressing some of the locking issues.

Sounds good.

> WRT the locking issues, I've been wondering whether we could make
> LWLockWaitForVar() work that purpose, but I doubt it's the right approach.
> Probably better to get rid of the LWLock*Var functions and go for the approach
> I had in v1, namely a version of LWLockAcquire() with a callback that gets
> called between LWLockQueueSelf() and PGSemaphoreLock(), which can cause the
> lock acquisition to abort.

What are the best thing(s) to read to understand the locking issues?

> > > +# Verify checksum handling when creating database from an invalid database.
> > > +# This also serves as a minimal check that cross-database IO is handled
> > > +# reasonably.
> >
> > To me, "invalid database" is a term of art from the message "cannot connect to
> > invalid database". Hence, I would change "invalid database" to "database w/
> > invalid block" or similar, here and below. (Alternatively, just delete "from
> > an invalid database". It's clear from the context.)
>
> Yea, I agree, this is easy to misunderstand when stepping back. I went for "with
> an invalid block".

Sounds good.

> > > + if (corrupt_checksum)
> > > + {
> > > + bool successfully_corrupted = 0;
> > > +
> > > + /*
> > > + * Any single modification of the checksum could just end up being
> > > + * valid again. To be sure
> > > + */
> >
> > Unfinished sentence.
>
> Oops. See below.
>
>
> > That said, I'm not following why we'd need this loop. If this test code
> > were changing the input to the checksum, it's true that an input bit flip
> > might reach the same pd_checksum. The test case is changing pd_checksum,
> > not the input bits.
>
> We might be changing the input, due to the zero/corrupt_header options. Or we
> might be called on a page that is *already* corrupted. I did encounter that
> situation once while writing tests, where the tests only passed if I made the
> + 1 a + 2. Which was, uh, rather confusing and left me feel like I was cursed
> that day.

Got it.

> > I don't see how changing pd_checksum could leave the
> > page still valid. There's only one valid pd_checksum value for a given
> > input page.
>
> I updated the comment to:
> /*
> * Any single modification of the checksum could just end up being
> * valid again, due to e.g. corrupt_header changing the data in a way
> * that'd result in the "corrupted" checksum, or the checksum already
> * being invalid. Retry in that, unlikely, case.
> */

Works for me.

In response to

Re: AIO v2.5 at 2025-04-01 15:55:20 from Andres Freund

Responses

Re: AIO v2.5 at 2025-04-01 16:51:53 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-04-01 16:25:23	Re: Improve CRC32C performance on SSE4.2
Previous Message	Sami Imseih	2025-04-01 15:58:30	Re: Proposal - Allow extensions to set a Plan Identifier