Re: BitmapHeapScan streaming read user and prelim refactoring

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-13 22:38:38
Message-ID: CA+hUKG+a1NSHa-=7znx1EhmGXo+BFJH3mk3xJJLY3SPgJ0L2Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 3/2/24 23:28, Melanie Plageman wrote:
> > On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >> With the current "master" code, eic=1 means we'll issue a prefetch for B
> >> and then read+process A. And then issue prefetch for C and read+process
> >> B, and so on. It's always one page ahead.
> >
> > Yes, that is what I mean for eic = 1

I spent quite a few days thinking about the meaning of eic=0 and eic=1
for streaming_read.c v7[1], to make it agree with the above and with
master. Here's why I was confused:

Both eic=0 and eic=1 are expected to generate at most 1 physical I/O
at a time, or I/O queue depth 1 if you want to put it that way. But
this isn't just about concurrency of I/O, it's also about computation.
Duh.

eic=0 means that the I/O is not concurrent with executor computation.
So, to annotate an excerpt from [1]'s random.txt, we have:

effective_io_concurrency = 0, range size = 1
unpatched patched
==============================================================================
pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192
*** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192 pread(82,...,8192,0xb0000) = 8192
*** executor now has page at 0xb0000 to work on ***

eic=1 means that a single I/O is started and then control is returned
to the executor code to do useful work concurrently with the
background read that we assume is happening:

effective_io_concurrency = 1, range size = 1
unpatched patched
==============================================================================
pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192
posix_fadvise(43,0xb0000,0x2000,...) posix_fadvise(82,0xb0000,0x2000,...)
*** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192 pread(82,...,8192,0xb0000) = 8192
posix_fadvise(43,0x108000,0x2000,...) posix_fadvise(82,0x108000,0x2000,...)
*** executor now has page at 0xb0000 to work on ***
pread(43,...,8192,0x108000) = 8192 pread(82,...,8192,0x108000) = 8192
posix_fadvise(43,0x160000,0x2000,...) posix_fadvise(82,0x160000,0x2000,...)

In other words, 'concurrency' doesn't mean 'number of I/Os running
concurrently with each other', it means 'number of I/Os running
concurrently with computation', and when you put it that way, 0 and 1
are different.

Note that the first read is a bit special: by the time the consumer is
ready to pull a buffer out of the stream when we don't have a buffer
ready yet, it is too late to issue useful advice, so we don't bother.
FWIW I think even in the AIO future we would have a synchronous read
in that specific place, at least when using io_method=worker, because
it would be stupid to ask another process to read a block for us that
we want right now and then wait for it wake us up when it's done.

Note that even when we aren't issuing any advice because eic=0 or
because we detected sequential access and we believe the kernel can do
a better job than us, we still 'look ahead' (= call the callback to
see which block numbers are coming down the pipe), but only as far as
we need to coalesce neighbouring blocks. (I deliberately avoid using
the word "prefetch" except in very general discussions because it
means different things to different layers of the code, hence talk of
"look ahead" and "advice".) That's how we get this change:

effective_io_concurrency = 0, range size = 4
unpatched patched
==============================================================================
pread(43,...,8192,0x58000) = 8192 pread(82,...,8192,0x58000) = 8192
pread(43,...,8192,0x5a000) = 8192 preadv(82,...,2,0x5a000) = 16384
pread(43,...,8192,0x5c000) = 8192 pread(82,...,8192,0x5e000) = 8192
pread(43,...,8192,0x5e000) = 8192 preadv(82,...,4,0xb0000) = 32768
pread(43,...,8192,0xb0000) = 8192 preadv(82,...,4,0x108000) = 32768
pread(43,...,8192,0xb2000) = 8192 preadv(82,...,4,0x160000) = 32768

And then once we introduce eic > 0 to the picture with neighbouring
blocks that can be coalesced, "patched" starts to diverge even more
from "unpatched" because it tracks the number of wide I/Os in
progress, not the number of single blocks.

[1] https://www.postgresql.org/message-id/CA+hUKGLJi+c5jB3j6UvkgMYHky-qu+LPCsiNahUGSa5Z4DvyVA@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-03-13 23:00:24 Re: pg16: XX000: could not find pathkey item to sort
Previous Message Corey Huinker 2024-03-13 22:33:14 Re: Statistics Import and Export