From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-12-04 14:41:21 |
Message-ID: | CAApHDvpGX7RN+sh7Hn9HWZQKp53SjKaL=GtDzYheHWiEd-8moQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 12 Nov 2020 at 15:36, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I kicked off a script last night that ran benchmarks on master, v8 and
> v9 of the patch on 1 commit per day for the past 30 days since
> yesterday. The idea here is that as the code changes that if the
> performance differences are due to code alignment then there should be
> enough churn in 30 days to show if this is the case.
>
> The quickly put together script is attached. It would need quite a bit
> of modification to run on someone else's machine.
>
> This took about 20 hours to run. I found that v8 is faster on 28 out
> of 30 commits. In the two cases where v9 was faster, v9 took 99.8% and
> 98.5% of the time of v8. In the 28 cases where v8 was faster it was
> generally about 2-4% faster, but a couple of times 8-10% faster. Full
> results attached in .csv file. Also the query I ran to compare the
> results once loaded into Postgres.
Since running those benchmarks, Andres spent a little bit of time
looking at the v9 patch and he pointed out that I can use the same
projection info in the nested loop code with and without a cache hit.
I just need to ensure that inneropsfixed is false so that the
expression compilation includes a deform step when result caching is
enabled. Making it work like that did make a small performance
improvement, but further benchmarking showed that it was still not as
fast as the v8 patch (separate Result Cache node).
Due to that, I want to push forward with having the separate Result
Cache node and just drop the idea of including the feature as part of
the Nested Loop node.
I've attached an updated patch, v10. This is v8 with a few further
changes; I added the peak memory tracking and adjusted a few comments.
I added a paragraph to explain what RC_CACHE_BYPASS_MODE is. I also
noticed that the code I'd written to build the cache lookup expression
included a step to deform the outer tuple. This was unnecessary and
slowed down the expression evaluation.
I'm fairly happy with patches 0001 to 0003. However, I ended up
stripping out the subplan caching code out of 0003 and putting it in
0004. This part I'm not so happy with. The problem there is that when
planning a correlated subquery we don't have any context to determine
how many distinct values the subplan will be called with. For now, the
0004 patch just always includes a Result Cache for correlated
subqueries. The reason I don't like that is that it could slow things
down when the cache never gets a hit. The additional cost of adding
tuples to the cache is going to slow things down.
I'm not yet sure the best way to make 0004 better. I don't think using
AlternativeSubplans is a good choice as it means having to build two
subplans. Also determining the cheapest plan to use couldn't use the
existing logic that's in fix_alternative_subplan(). It might be best
left until we do some refactoring so instead of building subplans as
soon as we've run the planner, instead have it keep a list of Paths
around and then choose the best Path once the top-level plan has been
planned. That's a pretty big change.
On making another pass over this patchset, I feel there are two points
that might still raise a few eyebrows:
1. In order to not have Nested Loops picked with an inner Result Cache
when the inner index's parameters have no valid statistics, I modified
estimate_num_groups() to add a new parameter that allows callers to
pass an EstimationInfo struct to have the function set a flag to
indicate of DEFAULT_NUM_DISTINCT was used. Callers which don't care
about this can just pass NULL. I did once try adding a new parameter
to clauselist_selectivity() in 2686ee1b. There was not much
excitement about that we ended up removing it again. I don't see any
alternative here.
2. Nobody really mentioned they didn't like the name Result Cache. I
really used that as a placeholder name until I came up with something
better. I mentioned a few other names in [1]. If nobody is objecting
to Result Cache, I'll just keep it named that way.
David
Attachment | Content-Type | Size |
---|---|---|
v10-0003-Add-Result-Cache-executor-node.patch | text/plain | 144.7 KB |
v10-0004-Use-a-Result-Cache-node-to-cache-results-from-su.patch | text/plain | 25.7 KB |
v10-0002-Allow-users-of-simplehash.h-to-perform-direct-de.patch | text/plain | 3.5 KB |
v10-0001-Allow-estimate_num_groups-to-pass-back-further-d.patch | text/plain | 9.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-12-04 15:42:28 | Re: [HACKERS] [PATCH] Generic type subscripting |
Previous Message | Alvaro Herrera | 2020-12-04 14:37:20 | Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements |