Re: Some read stream improvements

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some read stream improvements
Date: 2025-03-11 18:35:46
Message-ID: CA+hUKG+MvgpRRdxq5GgB=TdDHhAyyimp75raGX7siBO6=VpLdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 27, 2025 at 11:20 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2025-02-27 11:19:55 +1300, Thomas Munro wrote:
> > On Wed, Feb 26, 2025 at 10:55 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I was working on expanding tests for AIO and as part of that wrote a test for
> > > temp tables -- our coverage is fairly awful, there were many times during AIO
> > > development where I knew I had trivially reachable temp table specific bugs
> > > but all tests passed.
> > >
> > > The test for that does trigger the problem described above and is fixed by the
> > > patches in this thread (which I included in the other thread):

Here is a subset of those patches again:

1. Per-backend buffer limit, take III. Now the check is in
read_stream_start_pending_read() so TOC == TOU.

Annoyingly, test cases like the one below still fail, despite
following the rules. The other streams eat all the buffers and then
one gets an allowance of zero, but uses its right to take one pin
anyway to make progress, and there isn't one. I wonder if we should
use temp_buffers - 100? Then leave the minimum GUC value at 100
still, so you have an easy way to test with 0, 1, ... additional
buffers?

2. It shouldn't give up issuing random advice immediately after a
jump, or it could stall on (say) the second 128kB of a 256kB
sequential chunk (ie the strace you showed on the BHS thread). It
only makes sense to assume kernel readahead takes over once you've
actually *read* sequentially. In practice this makes it a lot more
aggressive about advice (like the BHS code in master): it only gives
up if the whole look-ahead window is sequential.

3. Change the distance algorithm to care only about hits and misses,
not sequential heuristics. It made at least some sense before, but it
doesn't make sense for AIO, and even in synchronous mode it means that
you hit random jumps with insufficient look-ahead, so I don't think we
should keep it.

I also realised that the sequential heuristics are confused by that
hidden trailing block thing, so in contrived pattern testing with
hit-miss-hit-miss... would be considered sequential, and even if you
fix that (the forwarding patches above fix that), an exact
hit-miss-hit-miss pattern also gets stuck between distances 1 and 2
(double, decrement, double, ... might be worth waiting a bit longer
before decrementing, IDK.

I'll rebase the others and post soon.

set io_combine_limit = 32;
set temp_buffers = 100;

create temp table t1 as select generate_series(1, 10000);
create temp table t2 as select generate_series(1, 10000);
create temp table t3 as select generate_series(1, 10000);
create temp table t4 as select generate_series(1, 10000);
create temp table t5 as select generate_series(1, 10000);

do
$$
declare
c1 cursor for select * from t1;
c2 cursor for select * from t2;
c3 cursor for select * from t3;
c4 cursor for select * from t4;
c5 cursor for select * from t5;
x record;
begin
open c1;
open c2;
open c3;
open c4;
open c5;
loop
fetch next from c1 into x;
exit when not found;
fetch next from c2 into x;
exit when not found;
fetch next from c3 into x;
exit when not found;
fetch next from c4 into x;
exit when not found;
fetch next from c5 into x;
exit when not found;
end loop;
end;
$$;

Attachment Content-Type Size
v2-0001-Improve-buffer-manager-API-for-backend-pin-limits.patch text/x-patch 6.7 KB
v2-0002-Respect-pin-limits-accurately-in-read_stream.c.patch text/x-patch 9.7 KB
v2-0003-Improve-read-stream-advice-for-larger-random-read.patch text/x-patch 4.0 KB
v2-0004-Look-ahead-more-when-sequential-in-read_stream.c.patch text/x-patch 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2025-03-11 18:56:43 Re: protocol support for labels
Previous Message Bernd Helmle 2025-03-11 18:28:54 Re: [PATCH] Add sortsupport for range types and btree_gist