Re: BAS_BULKREAD vs read stream

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BAS_BULKREAD vs read stream
Date: 2025-04-08 02:20:38
Message-ID: hdcz65oehetyjygraktst2xb77qnyiig5mxhi46yhvfmzdgnqo@tarjv3ofqoe4
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-07 16:28:20 -0400, Andres Freund wrote:
> On 2025-04-07 15:24:43 -0400, Melanie Plageman wrote:
> > On Sun, Apr 6, 2025 at 4:15 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > I think we should consider increasing BAS_BULKREAD TO something like
> > > Min(256, io_combine_limit * (effective_io_concurrency + 1))
> >
> > Do you mean Max? If so, this basically makes sense to me.
>
> Err, yes.

In the attached I implemented the above idea, with some small additional
refinements:

- To allow sync seqscans to work at all, we should only *add* to the 256kB
that we currently have - otherwise all buffers in a ring will be undergoing
IO, never allowing two synchronizing scans to actually use the buffers that
the other scan has already read in.

This also kind of obsoletes the + 1 in the formula above, although that is
arguable, particularly for effective_io_concurrency=0.

- If the backend has a PinLimit() that won't allow io_combine_limit *
effective_io_concurrency buffers to undergo IO, it doesn't make sense to
make the ring bigger. At best it would waste space for the ring, at worst
it'd make "ring escapes" inevitable - victim buffer search would always
replace buffers that we have in the ring.

- the multiplication by (BLCKSZ / 1024) that I omitted above is actually
included :)

I unfortunately think we do need *something* to address $subject for 18 - the
performance regression when increasing relation sizes is otherwise just too
big - it's trivial to find queries getting slower by more than 4x. On local,
low-latency NVMe storage - on network storage the regression will often be
bigger.

If we don't do something for 18, only consolation would be that the
performance when using the 256kB BAS_BULKREAD is rather close to the
performance one gets in 17, with or without without a strategy. But I don't
think that would make it less surprising that once your table grows sufficient
to use a strategy your IO throughput craters.

I've some local prototype for the 17/18 "strategy escape" issue, will work on
polishing that soon, unless you have something for that Thomas?

Greetings,

Andres Freund

Attachment Content-Type Size
v2-0001-Increase-BAS_BULKREAD-based-on-effective_io_concu.patch text/x-diff 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-04-08 02:33:45 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Jacob Champion 2025-04-08 02:10:10 Re: [PoC] Federated Authn/z with OAUTHBEARER