Re: Using read stream in autoprewarm

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matheus Alcantara <mths(dot)dev(at)pm(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Using read stream in autoprewarm
Date: 2025-03-30 14:00:45
Message-ID: CAN55FZ32QnQM41RMRU7a8T6A7anycWPr9PbK0dozOfgz9SxRPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for looking into this!

On Sat, 29 Mar 2025 at 23:10, Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Fri, Nov 29, 2024 at 6:20 AM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> >
> > v4 is attached.
>
> I've started looking at this version. What I don't like about this
> structure is that we are forced to increment the position in the array
> of BlockInfoRecords in both the callback and the main loop in
> autoprewarm_database_main(). There isn't a way around it because we
> have to return control to the user when we encounter a new relation
> (we can't close the relation and destroy the read stream in the
> callback). And in the main loop in autoprewarm_database_main(), we may
> fail to open the next relation and then need to keep advancing the
> position in the array of BlockInfoRecords.
>
> It isn't just that we have to advance the position in both places --
> we also have to have a special case for the first block. All in all,
> given that in the current read stream API, a single stream must only
> concern itself with a single relation, fork combo, I think there is no
> elegant way to deal with this in autoprewarm.
>
> One alternative is to loop through the array of BlockInfoRecords and
> get the start and end positions of the blocks in the arary for a
> single relation/fork combo. Then we could make the read stream and
> pass those two positions and the array as callback_private_data. That
> would mean we loop through the whole array twice, but I wonder if the
> improvement in clarity is worth it?

I think this is a good alternative. I will work on this and try to
propose a patch.

> Some review feedback on your v4: I don't think we need the
> rs_have_free_buffer per_buffer_data. We can just check
> have_free_buffers() in both the callback and main loop in
> autoprewarm_database_main(). I also think you want a comment about why
> first_block is needed. And I think you need to guard the
> read_stream_end() after the loop -- what if we never made a read
> stream because we errored out for all the block's relations or
> something?

All of these are addressed. One extra thing I noticed is we were not
checking if blocknum < number_of_block_in_relation at the first_block
case in the stream callback, this is fixed now.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment Content-Type Size
v5-0001-Optimize-autoprewarm-with-read-streams.patch application/octet-stream 6.0 KB
v5-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch application/octet-stream 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2025-03-30 15:03:06 Re: pg_stat_database.checksum_failures vs shared relations
Previous Message jian he 2025-03-30 13:53:22 Re: speedup COPY TO for partitioned table.