Re: Use read streams in pg_visibility

From: Noah Misch <noah(at)leadboat(dot)com>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use read streams in pg_visibility
Date: 2024-08-30 23:51:06
Message-ID: 20240830235106.ac.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2024 at 10:49:19AM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 23 Aug 2024 at 22:01, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Fri, Aug 23, 2024 at 02:20:06PM +0300, Nazir Bilal Yavuz wrote:
> > > On Tue, 20 Aug 2024 at 21:47, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote:

> I liked the block_range_read_stream_cb. Attached patches for that
> (first 3 patches). I chose an nblocks way instead of last_blocks in
> the struct.

To read blocks 10 and 11, I would expect to initialize the struct with one of:

{ .first=10, .nblocks=2 }
{ .first=10, .last_inclusive=11 }
{ .first=10, .last_exclusive=12 }

With the patch's API, I would need {.first=10,.nblocks=12}. The struct field
named "nblocks" behaves like a last_block_exclusive. Please either make the
behavior an "nblocks" behavior or change the field name to replace the term
"nblocks" with something matching the behavior. (I used longer field names in
my examples here, to disambiguate those examples. It's okay if the final
field names aren't those, as long as the field names and the behavior align.)

> > > > The callback doesn't return blocks having zero vm bits, so the blkno variable
> > > > is not accurate. I didn't test, but I think the loop's "Recheck to avoid
> > > > returning spurious results." looks at the bit for the wrong block. If that's
> > > > what v1 does, could you expand the test file to catch that? For example, make
> > > > a two-block table with only the second block all-visible.
> > >
> > > Yes, it was not accurate. I am getting blockno from the buffer now. I
> > > checked and confirmed it is working as expected by manually logging
> > > blocknos returned from the read stream. I am not sure how to add a
> > > test case for this.
> >
> > VACUUM FREEZE makes an all-visible, all-frozen table. DELETE of a particular
> > TID, even if rolled back, clears both vm bits for the TID's page. Past tests
> > like that had instability problems. One cause is a concurrent session's XID
> > or snapshot, which can prevent VACUUM setting vm bits. Using a TEMP table may
> > have been one of the countermeasures, but I don't remember clearly. Hence,
> > please search the archives or the existing pg_visibility tests for how we
> > dealt with that. It may not be problem for this particular test.
>
> Thanks for the information, I will check these. What I still do not
> understand is how to make sure that only the second block is processed
> and the first one is skipped. pg_check_visible() and pg_check_frozen()
> returns TIDs that cause corruption in the visibility map, there is no
> information about block numbers.

I see what you're saying. collect_corrupt_items() needs a corrupt table to
report anything; all corruption-free tables get the same output. Testing this
would need extra C code or techniques like corrupt_page_checksum() to create
the corrupt state. That wouldn't be a bad thing to have, but it's big enough
that I'll consider it out of scope for $SUBJECT. With the callback change
above, I'll be ready to push all this.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-08-31 00:12:23 Re: Interrupts vs signals
Previous Message Heikki Linnakangas 2024-08-30 22:17:40 Re: Interrupts vs signals