Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2025-02-10 21:22:21
Message-ID: CAAKRu_Y8rtavnZCSTk2DrYjKMygoSYuVKdPzbV5WviLzzeUKUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 10, 2025 at 1:02 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> It'll be hard to look into all of these, so I think I'll focus on
> trying to reproduce something with eic=1 that I can reproduce on my
> machine. So far, I can reproduce a regression with the following and
> the data file attached.
>
> # initdb and get set up with shared_buffers 1GB
> psql -c "create table bitmap_scan_test (a bigint, b bigint, c text)
> with (fillfactor = 25)"
> psql -c "copy bitmap_scan_test from '/tmp/bitmap_scan_test.data'"
> psql -c "create index on bitmap_scan_test (a)"
> psql -c "vacuum analyze"
> psql -c "checkpoint"
>
> pg_ctl stop
> echo 3 | sudo tee /proc/sys/vm/drop_caches
> pg_ctl start
> psql -c "SET max_parallel_workers_per_gather = 4;" \
> -c "SET effective_io_concurrency = 1;" \
> -c "SET parallel_setup_cost = 0;" \
> -c "SET parallel_tuple_cost = 0;" \
> -c "SET enable_seqscan = off;" \
> -c "SET enable_indexscan = off;" \
> -c "SET work_mem = 65536;"
>
> psql -c "EXPLAIN SELECT * FROM bitmap_scan_test WHERE (a BETWEEN -33
> AND 10015) OFFSET 1000000;"
> psql -c "SELECT * FROM bitmap_scan_test WHERE (a BETWEEN -33 AND
> 10015) OFFSET 1000000;"

I think I figured out why there is a regression. On master, parallel
bitmap heap scans seem to end up cheating effective_io_concurrency.

What you expect to see with effective_io_concurrency == 1 is a single
pread followed by a single fadvise. We can prefetch up to one block
before reading the next block. This is what you see on both the patch
and master with a serial bitmap heap scan. This is also what you see
with the patch if you strace a participating parallel bitmap heap scan
process. On master, however, you do not see this 1-1 interleaving for
parallel bitmap heap scan.

On master we typically issue many fadvises in a row followed by a few
preads in a row.
For example:
fadvise64
fadvise64
fadvise64
fadvise64
pread64
fadvise64
fadvise64
pread64
pread64
fadvise64

On master, while executing this query, the leader did more than 2000
runs of > 1 fadvise or pread in a row. With the patch, there are
essentially none.

On master, parallel bitmap heap scans' prefetching behavior is
controlled by some shared pstate members, prefetch_target and
prefetch_pages. Prefetching is supposed to be allowed only up to
prefetch_target -- which is capped at effective_io_concurrency.
Incrementing and decrementing these variables is not based on whether
or not the process actually did a read or a prefetch -- only on the
values of those shared memory variables. I think what is happening
due to quirks of CPU scheduling is that some of the processes are
actually issuing more consecutive reads and prefetches and another
process is incrementing and decrementing those values in a way that
makes this possible.

This effectively increases effective_io_concurrency for parallel
bitmap heap scans on master. The patch can't really compete because it
is interleaving every read with an fadvise -- preventing readahead.

I don't really know what to do about this. The behavior of master
parallel bitmap heap scan can be emulated with the patch by increasing
effective_io_concurrency. But, IIRC we didn't want to do that for some
reason?
Not only does effective_io_concurrency == 1 negatively affect read
ahead, but it also prevents read combining regardless of the
io_combine_limit.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-10 21:24:25 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Ilia Evdokimov 2025-02-10 21:14:47 Re: explain analyze rows=%.0f