Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2021-03-12 00:38:49
Message-ID: CAApHDvopso12iG_gKD7Vu0qu7Xo2UWdp4508Zb=k8toi=DA_iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 23 Feb 2021 at 14:22, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>> - Maybe this should be integrated into nestloop rather than being a separate
>> plan node. That means that it could be dynamically enabled during
>> execution, maybe after a few loops or after checking that there's at least
>> some minimal number of repeated keys and cache hits. cost_nestloop would
>> consider whether to use a result cache or not, and explain would show the
>> cache stats as a part of nested loop.
>
>
> +1 for this idea now.. I am always confused why there is no such node in Oracle
> even if it is so aggressive to do performance improvement and this function
> looks very promising. After realizing the costs in planner, I think planning time
> might be an answer (BTW, I am still not sure Oracle did this).

If you're voting for merging Result Cache with Nested Loop and making
it a single node, then that was already suggested on this thread. I
didn't really like the idea and I wasn't alone on that. Tom didn't
much like it either. Never-the-less, I went and coded it and found
that it made the whole thing slower.

There's nothing stopping Result Cache from switching itself off if it
sees poor cache hit ratios. It can then just become a proxy node,
effectively doing nothing apart from fetching from its own outer node
when asked for a tuple. It does not need to be part of Nested Loop to
have that ability.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-03-12 00:41:04 Re: automatic analyze: readahead - add "IO read time" log message
Previous Message David Rowley 2021-03-12 00:31:43 Re: Hybrid Hash/Nested Loop joins and caching results from subplans