From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-05-20 12:56:34 |
Message-ID: | CANP8+jLkh_g7Eu-=WoNP7P-Mi2Ye=Kt-5kceho80Xnd8xWrvfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 20 May 2020 at 12:44, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Hackers,
>
> Over on [1], Heikki mentioned about the usefulness of caching results
> from parameterized subplans so that they could be used again for
> subsequent scans which have the same parameters as a previous scan.
> On [2], I mentioned that parameterized nested loop joins could see
> similar gains with such a cache. I suggested there that instead of
> adding code that only allows this to work for subplans, that instead,
> we add a new node type that can handle the caching for us. We can
> then just inject that node type in places where it seems beneficial.
>
Very cool
> I've attached a patch which implements this. The new node type is
> called "Result Cache". I'm not particularly wedded to keeping that
> name, but if I change it, I only want to do it once. I've got a few
> other names I mind, but I don't feel strongly or confident enough in
> them to go and do the renaming.
>
> How the caching works:
>
> First off, it's only good for plugging in on top of parameterized
> nodes that are rescanned with different parameters. The cache itself
> uses a hash table using the simplehash.h implementation. The memory
> consumption is limited to work_mem. The code maintains an LRU list and
> when we need to add new entries but don't have enough space to do so,
> we free off older items starting at the top of the LRU list. When we
> get a cache hit, we move that entry to the end of the LRU list so that
> it'll be the last to be evicted.
>
> When should we cache:
>
> For nested loop joins, the decision is made purely based on cost.
I thought the main reason to do this was the case when the nested loop
subplan was significantly underestimated and we realize during execution
that we should have built a hash table. So including this based on cost
alone seems to miss a trick.
> The patch does rely heavily on good ndistinct estimates.
Exactly. We know we seldom get those with many-way joins.
So +1 for adding this technique. My question is whether it should be added
as an optional facility of a parameterised sub plan, rather than an
always-needed full-strength node. That way the choice of whether to use it
can happen at execution time once we notice that we've been called too many
times.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
Mission Critical Databases
From | Date | Subject | |
---|---|---|---|
Next Message | Vik Fearing | 2020-05-20 13:04:20 | Re: SEARCH and CYCLE clauses |
Previous Message | Atsushi Torikoshi | 2020-05-20 12:56:04 | Re: Is it useful to record whether plans are generic or custom? |