Re: Using read stream in autoprewarm

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matheus Alcantara <mths(dot)dev(at)pm(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Using read stream in autoprewarm
Date: 2025-04-02 10:26:42
Message-ID: CAN55FZ2ZSBLvfSF0wMAwNAkjZO6sMZrDhAKB2vZsBG0DnrVV=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 2 Apr 2025 at 01:36, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>
> On Tue, Apr 1, 2025 at 8:50 AM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> >
> > I am attaching v8, which is an updated version of the v7. I tried to
> > get rid of these local variables and refactored code to make logic
> > more straightforward instead of going back and forth.
> >
> > 0001 and 0002 are v8. 0003 is another refactoring attempt to make code
> > more straightforward. I did not squash 0003 to previous patches as you
> > might not like it.
>
> I looked at the code on your github branch that has all three of these
> squashed together.

Thank you!

> I think our approaches are converging. I like that you are
> fast-forwarding to the next filenumber or fork number explicitly when
> there is a bad relation or fork. I've changed my version (see newest
> one attached) to do the fast-forwarding inline instead of in a
> separate function like yours (the function didn't save many LOC and
> actually may have added to cognitive overhead).
>
> Compared to my version, I think you avoided one level of loop nesting with your
>
> if (!rel)
> else if (smgrexists(RelationGetSmgr(rel), blk->forknum))
> else
>
> but for starters, I don't think you can do this:
>
> else if (smgrexists(RelationGetSmgr(rel), blk->forknum))
>
> because you didn't check if you have a legal forknum first

You are right, I missed that. I think smgrexists() should return NULL
if the forknum is invalid but it is not a topic for this thread.

> And, I actually kind of prefer the explicitly nested structure
>
> loop through all relations
> loop through all forks
> loop through all buffers

I prefer this as well. We know when we opened the relation, so we do
not need to close it in two places like I did.

> While in the old structure, I liked your
> autoprewarm_prewarm_relation() function, but I think it is nicer
> inlined like in my version. It makes the loop through all buffers
> explicit too.

Yes, I liked your approach.

> I know you mentioned off-list that you don't like the handling of
> global objects in my version, but I prefer doing it this way (even
> though we have to check for in the loop condition) to having to set
> the current database once we reach non-shared objects. It feels too
> fiddly. This way seems less error prone. Looking at this version, what
> do you think? Could we do it better?

I think there might be a problem with that approach. Let's say that we
are able to open relation when database oid = 0 and filenumber = 18.
Then we are trying to find a valid fork now. We couldn't find a valid
fork immediately, so we continued looping. Then database oid is
changed from 0 to let's say 1 but filenumber remains the same. We are
still in the valid fork loop, so relation remains from the database
oid = 0. Isn't that wrong?

> Let me know what you think of this version. I think it is the best of
> both our approaches. I've separated it into two commits -- the first
> does all the refactoring without using the read stream API and the
> second one uses the read stream API.

Some comments,

0001:

ReadBufferExtended() can be called in its own minimal loop, otherwise
we end up doing unnecessary checks for each ReadBufferExtended() call.
This is not a problem when the 0002 is applied.

0002:

We don't skip blocks whose blocknum is more than nblocks_in_fork. We
can add that either to a stream callback like you did before or after
the read_stream_end. I prefer stream callback because of the reason
below [1].

> On another topic, what are the minimal places we need to call
> have_free_buffers() (in this version)? I haven't even started looking
> at the last patch you've been sending that is about checking the
> freelist. I'll have to do that next.

I think its current places are good enough. We may add one after the
read_stream_end if we want to handle blk->blocknum >= nblocks_in_fork
after the read stream finishes. If we handle that in the stream
callback then no need to add have_free_buffers() [1].

Other than these comments, I think the current structure looks good.

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-04-02 10:39:34 Re: Fix 035_standby_logical_decoding.pl race conditions
Previous Message Ashutosh Bapat 2025-04-02 10:06:33 Re: Test to dump and restore objects left behind by regression