Re: neqjoinsel versus "refresh materialized view concurrently"

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: neqjoinsel versus "refresh materialized view concurrently"
Date: 2018-03-14 01:56:30
Message-ID: CAMkU=1ywGwT9n0dAVC+8Ssys3CjtcVak2WU6U_Up4-ocAOx=PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 13, 2018 at 4:57 PM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
> wrote:

> On Wed, Mar 14, 2018 at 12:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> >> There is a fundamental and complicated estimation problem lurking here
> >> of course and I'm not sure what to think about that yet. Maybe there
> >> is a very simple fix for this particular problem:
> >
> > Ah, I see you thought of the same hack I did.
> >
> > I think this may actually be a good fix, and here's the reason: this plan
> > is in fact being driven entirely off planner default estimates, because
> > we don't have any estimation code that knows what to do with
> > "wholerowvar *= wholerowvar". I'm suspicious that we could drop the
> > preceding ANALYZE as being a waste of cycles, except maybe it's finding
> > out the number of rows for us. In any case, LIMIT 1 is only a good idea
> > to the extent that the planner knows what it's doing, and this is an
> > example where it demonstrably doesn't and won't any time soon.
>
> Hmm. I wonder if the ANALYZE might have been needed to avoid the
> nested loop plan at some point in history.
>
> Here's a patch to remove LIMIT 1, which fixes the plan for Jeff's test
> scenario and some smaller and larger examples I tried. The query is
> already executed with SPI_execute(..., 1) so it'll give up after one
> row anyway. The regression test includes a case that causes a row to
> be produced here and that's passing ('ERROR: new data for
> materialized view "mvtest_mv" contains duplicate rows without any null
> columns').
>

Is there any good way to make the regression tests fail if the plan reverts
to the bad one? The only thing I can think of would be to make the table
bigger so the regression tests becomes "noticeably slower", but that is
pretty vague and not user friendly to formally pass and just hope it is
slow enough for someone to investigate.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Finnerty 2018-03-14 01:57:17 Re: add queryEnv to ExplainOneQuery_hook
Previous Message Michael Paquier 2018-03-14 01:54:39 Re: Fixes for missing schema qualifications