From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Streaming read-ready sequential scan code |
Date: | 2024-04-04 11:45:12 |
Message-ID: | CA+hUKG+wWKJ=mEXj0Xg06zQop6foV6dsirFYf2o7wMzbn_OKeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 4, 2024 at 8:02 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> 3a4a3537a
> latency average = 34.497 ms
> latency average = 34.538 ms
>
> 3a4a3537a + read_stream_for_seqscans.patch
> latency average = 40.923 ms
> latency average = 41.415 ms
>
> i.e. no meaningful change from the refactor, but a regression from a
> cached workload that changes the page often without doing much work in
> between with the read stread patch.
I ran Heikki's test except I ran the "insert" 4 times to get a table
of 4376MB according to \d+. On my random cloud ARM server (SB=8GB,
huge pages, parallelism disabled), I see a speedup 1290ms -> 1046ms
when the data is in LInux cache and PG is not prewarmed, roughly as he
reported. Good.
If I pg_prewarm first, I see that slowdown 260ms -> 294ms. Trying
things out to see what works, I got that down to 243ms (ie beat
master) by inserting a memory prefetch:
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -757,6 +757,8 @@ read_stream_next_buffer(ReadStream *stream, void
**per_buffer_data)
/* Prepare for the next call. */
read_stream_look_ahead(stream, false);
+ __builtin_prefetch(BufferGetPage(stream->buffers[stream->oldest_buffer_index]));
Maybe that's a solution to a different problem that just happens to
more than make up the difference in this case, and it may be
questionable whether that cache line will survive long enough to help
you, but this one-tuple-per-page test likes it... Hmm, to get a more
realistic table than the one-tuple-per-page on, I tried doubling a
tenk1 table until it reached 2759MB and then I got a post-prewarm
regression 702ms -> 721ms, and again I can beat master by memory
prefetching: 689ms.
Annoyingly, with the same table I see no difference between the actual
pg_prewarm('giga') time: around 155ms for both. pg_prewarm is able to
use the 'fast path' I made pretty much just to be able to minimise
regression in that (probably stupid) all-cached tes that doesn't even
look at the page contents. Unfortunately seq scan can't use it
because it has per-buffer data, which is one of the things it can't do
(because of a space management issue). Maybe I should try to find a
way to fix that.
> I'm happy to run further benchmarks, but for the remainder of the
> committer responsibility for the next patches, I'm going to leave that
> to Thomas.
Thanks!
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-04-04 11:46:49 | Re: promotion related handling in pg_sync_replication_slots() |
Previous Message | shveta malik | 2024-04-04 11:35:28 | promotion related handling in pg_sync_replication_slots() |