Re: BTScanOpaqueData size slows down tests

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: BTScanOpaqueData size slows down tests
Date: 2025-04-02 15:45:59
Message-ID: CAH2-WznTtOn9Tek409P8YynXsrPD7NsZHq194M9o81QXQN78+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 2, 2025 at 11:36 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Ouch! I had no idea it had gotten that big. Yeah, we ought to
> do something about that.

Tomas Vondra talked about this recently, in the context of his work on
prefetching.

> > And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole
> > only when mark/restore are used?
>
> That'd be an easy way of removing about half of the problem, but
> 14kB is still too much. How badly do we need this items array?
> Couldn't we just reference the on-page items?

I'm not sure what you mean by that. The whole design of _bt_readpage
is based on the idea that we read a whole page, in one go. It has to
batch up the items that are to be returned from the page somewhere.
The worst case is that there are about 1350 TIDs to return from any
single page (assuming default BLCKSZ). It's very pessimistic to start
from the assumption that that worst case will be hit, but I don't see
a way around doing it at least some of the time.

The first thing I'd try is some kind of simple dynamic allocation
scheme, with a small built-in array that avoided any allocation
penalty in the common case where there weren't too many tuples to
return from the page.

The way that we allocate BLCKSZ twice for index-only scans (one for
so->currTuples, the other for so->markTuples) is also pretty
inefficient. Especially because any kind of use of mark and restore is
exceedingly rare.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-04-02 15:49:07 Re: Improve monitoring of shared memory allocations
Previous Message Andres Freund 2025-04-02 15:37:41 Re: Incorrect result of bitmap heap scan.