Re: BitmapHeapScan streaming read user and prelim refactoring

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2025-02-12 20:07:51
Message-ID: 28075419-1513-44c0-94ae-f892c5bcf9a8@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/12/25 20:08, Robert Haas wrote:
> On Mon, Feb 10, 2025 at 4:41 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>> I had taken to thinking of it as "queue depth". But I think that's not
>> really accurate. Do you know why we set it to 1 as the default? I
>> thought it was because the default should be just prefetch one block
>> ahead. But if it was meant to be spindles, was it that most people had
>> one spindle (I don't really know what a spindle is)?
>
> AFAIK, we're imagining that people are running PostgreSQL on a hard
> drive like the one my desktop machine had when I was in college, which
> indeed only had one spindle. There were one or possibly several
> magnetic platters rotating around a single point. But today, nobody
> has that, certainly not on a production machine. For instance if you
> are using RAID or LVM you have several drives hooked up together, so
> that would be multiple spindles even if all of them were HDDs, but
> probably they are all SSDs which don't have spindles anyway. If you
> are on a virtual machine you have a slice of the underlying machine's
> I/O and whatever that is it's probably something more complicated than
> a single spindle.

I think we've abandoned the idea that effective_io_concurrency has
anything to do with the number of spindles - it was always a bit wrong
because even spinning rust had TCQ/NCQ (i.e. ability to optimize head
movement for a queue of requests). But it completely lost the meaning
with flash storage. And in PG13 we even stopped "recalculating" the
prefetch distance, and use raw effective_io_concurrency.

>
> To be fair, I guess the concept of effective_io_concurrency isn't
> completely meaningless if you think about it as the number of
> simultaneous I/O requests that can be in flight at a time. But I still
> think it's true that it's hard to make any reasonable guess as to how
> you should set this value to get good performance. Maybe people who
> work as consultants and configure lots of systems have good intuition
> here, but anybody else is probably either not going to know about this
> parameter or flail around trying to figure out how to set it.
>

Yes, the "queue depth" interpretation seems to be much more accurate
than "number of spindles".

> It might be interesting to try some experiments on a real small VM
> from some cloud provider and see what value is optimal there these
> days. Then we could set the default to that value or perhaps somewhat
> more.
>

AFAICS the "1" value is simply one of the many "defensive" defaults in
our sample config. It's much more likely to help than cause harm, even
on smaller/older systems, but for many systems a higher value would be
more appropriate. There's usually a huge initial benefit (say, going to
16 or 32), and then the benefits diminish fairly quickly.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2025-02-12 20:11:21 Cost-based delay for (other) maintenance operations
Previous Message Robert Haas 2025-02-12 19:56:58 Re: explain analyze rows=%.0f