Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2025-02-13 00:40:30
Message-ID: CAAKRu_YhYy6Te5ZNiijtayr6mM-7O0B3UXm7mFOCN8U=iFgoxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 9, 2025 at 9:27 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> For the nvme RAID (device: raid-nvme), it's looks almost exactly the
> same, except that with parallel query (page 27) there's a clear area of
> regression with eic=1 (look for "column" of red cells). That's a bit
> unfortunate, because eic=1 is the default value.

So, I feel pretty confident after even more analysis today with Thomas
Munro that all of the parallel cases with effective_io_concurrency ==
1 are because master cheats effective_io_concurrency. Therefore, I'm
not too worried about those for now. We probably will have to increase
effective_io_concurrency before merging this patch, though.

Thomas mentioned this to me off-list, and I think he's right. We
likely need to rethink the way parallel bitmap heap scan workers get
block assignments for reading and prefetching to make it more similar
to parallel sequential scan. The workers should probably get
assignments of a range of blocks. On master, each worker does end up
issuing reads/fadvises for a bunch of blocks in a row -- even though
that isn't the intent of the parallel bitmap table scan
implementation. We are losing some of that with the patch -- but only
because it is behaving as you would expect given the implementation
and design. I don't consider that a blocker, though.

> For the SATA SSD RAID (device: raid-sata), it's similar - a couple
> regressions for eic=0, just like for NVMe. But then there's also a
> couple regressions for higher eic values, usually for queries with
> selective conditions.

I started trying to reproduce the regressions you saw on your SATA
devices for higher eic values. I was focusing just on serial bitmap
heap scan -- since that doesn't have the problems mentioned above. I
don't have a SATA drive, but I used dmsetup to add a 10ms delay to my
nvme drive. Unfortunately, that did the opposite of reproduce the
issue. With dm delay 10ms, the patch is 5 times faster than master.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-02-13 01:01:29 Re: Reduce TupleHashEntryData struct size by half
Previous Message Tom Lane 2025-02-13 00:08:31 Re: Small memory fixes for pg_createsubcriber