Re: Streaming read-ready sequential scan code

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Streaming read-ready sequential scan code
Date: 2024-04-04 15:20:35
Message-ID: CAAKRu_arLQjAknvDOa+DQkOPcZRtS257dETFLgrhEwAVdv+WBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 4, 2024 at 7:45 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Thu, Apr 4, 2024 at 8:02 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > 3a4a3537a
> > latency average = 34.497 ms
> > latency average = 34.538 ms
> >
> > 3a4a3537a + read_stream_for_seqscans.patch
> > latency average = 40.923 ms
> > latency average = 41.415 ms
> >
> > i.e. no meaningful change from the refactor, but a regression from a
> > cached workload that changes the page often without doing much work in
> > between with the read stread patch.
>
> I ran Heikki's test except I ran the "insert" 4 times to get a table
> of 4376MB according to \d+. On my random cloud ARM server (SB=8GB,
> huge pages, parallelism disabled), I see a speedup 1290ms -> 1046ms
> when the data is in LInux cache and PG is not prewarmed, roughly as he
> reported. Good.
>
> If I pg_prewarm first, I see that slowdown 260ms -> 294ms. Trying
> things out to see what works, I got that down to 243ms (ie beat
> master) by inserting a memory prefetch:
>
> --- a/src/backend/storage/aio/read_stream.c
> +++ b/src/backend/storage/aio/read_stream.c
> @@ -757,6 +757,8 @@ read_stream_next_buffer(ReadStream *stream, void
> **per_buffer_data)
> /* Prepare for the next call. */
> read_stream_look_ahead(stream, false);
>
> + __builtin_prefetch(BufferGetPage(stream->buffers[stream->oldest_buffer_index]));
>
> Maybe that's a solution to a different problem that just happens to
> more than make up the difference in this case, and it may be
> questionable whether that cache line will survive long enough to help
> you, but this one-tuple-per-page test likes it... Hmm, to get a more
> realistic table than the one-tuple-per-page on, I tried doubling a
> tenk1 table until it reached 2759MB and then I got a post-prewarm
> regression 702ms -> 721ms, and again I can beat master by memory
> prefetching: 689ms.
>
> Annoyingly, with the same table I see no difference between the actual
> pg_prewarm('giga') time: around 155ms for both. pg_prewarm is able to
> use the 'fast path' I made pretty much just to be able to minimise
> regression in that (probably stupid) all-cached tes that doesn't even
> look at the page contents. Unfortunately seq scan can't use it
> because it has per-buffer data, which is one of the things it can't do
> (because of a space management issue). Maybe I should try to find a
> way to fix that.

So, sequential scan does not have per-buffer data. I did some logging
and the reason most fully-in-SB sequential scans don't use the fast
path is because read_stream->pending_read_nblocks is always 0.

When when read_stream->distance stays 1 (expected for all-in-SB as it
is initialized to 1 and we don't want distance to ramp up),
read_stream_look_ahead() never increments
read_stream->pending_read_nblocks because it sets it to 1 the first
time it is called and then the conditions of the while loop are not
met again

while (stream->ios_in_progress < stream->max_ios &&
stream->pinned_buffers + stream->pending_read_nblocks <
stream->distance)

distance is 1, pending_read_nblocks is 1, thus we only loop once and
don't increment pending_read_nblocks.

prewarm is only able to use the fast path because it passes
READ_STREAM_FULL and thus initializes read_stream->distance to a
higher initial value.

I added some logging to see if any of the sequential scans in the
regression suite used the fast path. The one example I see of the fast
path being used is a temp table IO stats test in
src/test/regress/sql/stats.sql. I didn't check exactly what conditions
led it to do this. But we probably want seq scans which are all in SB
to use the fast path.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-04-04 15:23:28 Re: Add trim_trailing_whitespace to editorconfig file
Previous Message Peter Eisentraut 2024-04-04 15:19:53 Re: Psql meta-command conninfo+