Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Elvis Pranskevichus <elprans(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
Date: 2021-10-26 07:50:56
Message-ID: CAApHDvotKW3wt36gqw2M0yjhN0jm9yQsEzLu6BCpUXwurH8Gpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 5 Oct 2021 at 20:05, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I'm currently thinking over what the best fix for this should be. My
> current thinking is that all parameters that are from above the
> Memoize that are required below the Memoize must be part of the Cache
> Key. Currently, it seems, that's not the case and in the plan from the
> attached plan, i1.id is being used below the Memoize but originates
> from above it.
>
> I'll need to look into how exactly to determine around when
> get_memoize_path() is called how we determine if there's any subplan
> parameters that will be needed in the Memoize path's sub path that
> originates above the memoize node.

I was looking at fixes for this bug. I started out with seeing what
would be required to make paraminfo_get_equal_hashops() find all the
SubPlan parameters that are on the inner side of the join that come
from a location other than the inner side of the join. The only way
I could think on how to do that would be to invent some way to
traverse Paths to look for vars from a given set of Relids. Right now
we just have ways to walk expression trees (expression_tree_walker).
That's not going to work here. Let's call this way "option 1".

An alternative way to fix the problem would be just to flush the cache
whenever a Param changes that is not part of the cache key. The
Materalize node works a bit like this, except it'll just flush the
cache whenever a param changes, since the Material cache is not
parameterized. One fairly hefty drawback with doing it this way is
that the planner does not get a chance to take into account the fact
that the cache could be flushed quite often. This could result in a
Nested Loop / Memoize join being chosen instead of a Hash or Merge
Join when the hash or merge join would be much more favourable. Let's
call this "option 2".

For option 1, I don't really like the sound of adding ways to traverse
paths in back branches. I'm also not too sure what the implication
would be for Custom scans. With option 2, I'm a bit worried that this
would cause Memoize to be costed much cheaper than it actually is due
to the planner thinking the cache hit ratio will be much higher than
it actually is.

Can anyone think of a 3rd option?

I've attached a patch for option 2.

David

Attachment Content-Type Size
memoize_flush_cache_when_noncache_param_changes.patch application/octet-stream 7.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2021-10-26 08:31:25 Re: BUG #17247: How to avoid crating multiple Foreign keys on same column on same table.
Previous Message Tom Lane 2021-10-26 06:03:54 Re: conchuela timeouts since 2021-10-09 system upgrade