From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jakub(dot)wartak(at)enterprisedb(dot)com |
Subject: | Re: Use streaming read API in ANALYZE |
Date: | 2024-04-08 01:20:21 |
Message-ID: | CA+hUKGJU8HZvVwjTLsdr=fHw=qUmenSBk03URo2rP4X2KtFU-w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 8, 2024 at 10:26 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote:
> > > src/backend/commands/analyze.c | 89 ++++++++++------------------------
> > > 1 file changed, 26 insertions(+), 63 deletions(-)
> >
> > That's a very nice demonstration of how this makes good prefetching easier...
>
> Agreed. Yay streaming read API and Bilal!
+1
I found a few comments to tweak, just a couple of places that hadn't
got the memo after we renamed "read stream", and an obsolete mention
of pinning buffers. I adjusted those directly.
I ran some tests on a random basic Linux/ARM cloud box with a 7.6GB
table, and I got:
cold hot
master: 9025ms 199ms
patched, io_combine_limit=1: 9025ms 191ms
patched, io_combine_limit=default: 8729ms 191ms
Despite being random, occasionally some I/Os must get merged, allowing
slightly better random throughput when accessing disk blocks through a
3000 IOPS drinking straw. Looking at strace, I see 29144 pread* calls
instead of 30071, which fits that theory. Let's see... if you roll a
fair 973452-sided dice 30071 times, how many times do you expect to
roll consecutive numbers? Each time you roll there is a 1/973452
chance that you get the last number + 1, and we have 30071 tries
giving 30071/973452 = ~3%. 9025ms minus 3% is 8754ms. Seems about
right.
I am not sure why the hot number is faster exactly. (Anecdotally, I
did notice that in the cases that beat master semi-unexpectedly like
this, my software memory prefetch patch doesn't help or hurt, while in
some cases and on some CPUs there is little difference, and then that
patch seems to get a speed-up like this, which might be a clue.
*Shrug*, investigation needed.)
Pushed. Thanks Bilal and reviewers!
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-04-08 01:22:40 | Re: Weird test mixup |
Previous Message | Peter Geoghegan | 2024-04-08 01:11:48 | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |