Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-08-25 17:18:34
Message-ID: CAKU4AWqA-jXLLsNa5CCu8rKLOfe8-WTjUU7U3o3-MvgkH1MdXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > While I'm against introducing a separate node for the caching, I'm
> *not*
> > > against displaying a different node type when caching is
> > > present. E.g. it'd be perfectly reasonable from my POV to have a
> 'Cached
> > > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > > probably still want to display the 'Cache Key' similar to your example,
> > > but I don't see how it'd be better to display it with one more
> > > intermediary node.
> >
> > ...Well, this is difficult... For the record, in case anyone missed
> > it, I'm pretty set on being against doing any node overloading for
> > this. I think it's a pretty horrid modularity violation regardless of
> > what text appears in EXPLAIN. I think if we merge these nodes then we
> > may as well go further and merge in other simple nodes like LIMIT.
>
> Huh? That doesn't make any sense. LIMIT is applicable to every single
> node type with the exception of hash. The caching you talk about is
> applicable only to node types that parametrize their sub-nodes, of which
> there are exactly two instances.
>
> Limit doesn't shuttle through huge amounts of tuples normally. What you
> talk about does.
>
>
>
> > Also, just in case anyone is misunderstanding this Andres' argument.
> > It's entirely based on the performance impact of having an additional
> > node.
>
> Not entirely, no. It's also just that it doesn't make sense to have two
> nodes setting parameters that then half magically picked up by a special
> subsidiary node type and used as a cache key. This is pseudo modularity,
> not real modularity. And makes it harder to display useful information
> in explain etc. And makes it harder to e.g. clear the cache in cases we
> know that there's no further use of the current cache. At least without
> piercing the abstraction veil.
>
>
> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.

If this is a key blocking factor for this topic, I'd like to do a simple
hack
to put the cache function into the subplan node, then do some tests to
show the real difference. But it is better to decide how much difference
can be thought of as a big difference. And for education purposes,
I'd like to understand where these differences come from. For my
current knowledge, my basic idea is it saves some function calls?

>

> > I understand that you've voiced your feelings about this, but what I
> > want to know is, how strongly do you feel about overloading the node?
> > Will you stand in my way if I want to push ahead with the separate
> > node? Will anyone else?
>
> I feel pretty darn strongly about this. If there's plenty people on your
> side I'll not stand in your way, but I think this is a bad design based on
> pretty flimsy reasons.
>
>
Nice to see the different opinions from two great guys and interesting to
see how this can be resolved at last:)

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-08-25 17:37:08 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Ranier Vilela 2020-08-25 17:13:42 Out-of-bounds access (ARRAY_VS_SINGLETON) (src/backend/access/nbtree/nbtdedup.c)