Quick Links

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date:	2020-11-12 02:36:18
Message-ID:	CAApHDvqvGZUPKHO+4Xp7Lm_q1OXBo2Yp1=5pVnEUcr4dgOXxEg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 10 Nov 2020 at 12:55, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Tue, 10 Nov 2020 at 12:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > > Are you taking into account the possibility that generated machine code
> > > is a small percent slower out of mere bad luck? I remember someone
> > > suggesting that they can make code 2% faster or so by inserting random
> > > no-op instructions in the binary, or something like that. So if the
> > > difference between v8 and v9 is that small, then it might be due to this
> > > kind of effect.
> >
> > Yeah. I believe what this arises from is good or bad luck about relevant
> > tight loops falling within or across cache lines, and that sort of thing.
> > We've definitely seen performance changes up to a couple percent with
> > no apparent change to the relevant code.
>
> It possibly is this issue.
>
> Normally how I build up my confidence in which is faster is why just
> rebasing on master as it advances and see if the winner ever changes.
> The theory here is if one patch is consistently the fastest, then
> there's more chance if there being a genuine reason for it.

I kicked off a script last night that ran benchmarks on master, v8 and
v9 of the patch on 1 commit per day for the past 30 days since
yesterday. The idea here is that as the code changes that if the
performance differences are due to code alignment then there should be
enough churn in 30 days to show if this is the case.

The quickly put together script is attached. It would need quite a bit
of modification to run on someone else's machine.

This took about 20 hours to run. I found that v8 is faster on 28 out
of 30 commits. In the two cases where v9 was faster, v9 took 99.8% and
98.5% of the time of v8. In the 28 cases where v8 was faster it was
generally about 2-4% faster, but a couple of times 8-10% faster. Full
results attached in .csv file. Also the query I ran to compare the
results once loaded into Postgres.

David

Attachment	Content-Type	Size
resultcachebench3.sh.txt	text/plain	2.4 KB
resultcache_small_multiple_version.csv	application/vnd.ms-excel	40.1 KB
resultcache_30_commits_test_results.txt	text/plain	4.5 KB

In response to

Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-11-09 23:55:01 from David Rowley

Responses

Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-11-22 13:21:06 from Andy Fan
Re: Hybrid Hash/Nested Loop joins and caching results from subplans at 2020-12-04 14:41:21 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	tsunakawa.takay@fujitsu.com	2020-11-12 02:46:08	RE: Disable WAL logging to speed up data loading
Previous Message	Fujii Masao	2020-11-12 02:28:32	Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW