Re: Planner reluctant to start from subquery

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Planner reluctant to start from subquery
Date: 2006-02-01 20:24:39
Message-ID: 43E0C4A7.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>> On Wed, Feb 1, 2006 at 2:14 pm, in message
<4218(dot)1138824885(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... expected an equivalent IN clause to work better. In fact, I'm
not
>>> clear why the planner isn't finding the cheapest plan (which it
does
>>> estimate as cheapest) from the IN version you posted.
>
>> All I know is that trying various permutations, I saw it pick a
good
>> plan for the IN format when I eliminated the last outer join in the
FROM
>> clause. I know it isn't conclusive, but it was a correlation which
>> suggested a possible causality to me.
>
> But there is still an outer join in your third example (the one with
the
> best plan), so that doesn't seem to hold water.

Right, if I moved the DocImageMetaData from a subquery in the WHERE
clause up to the FROM clause, or I eliminated all OUTER JOINs, it chose
a good plan. Of course, this was just playing with a few dozen
permutations, so it proves nothing -- I'm just sayin'....

> In any case, the way
> that IN planning works these days it really should have considered
the
> plan equivalent to your JOIN- against- GROUP- BY variant.
>
> I'm interested to poke at this ... are you in a position to provide
a
> test case?

I can't supply the original data, since many of the tables have
millions of rows, with some of the data (related to juvenile, paternity,
sealed, and expunged cases) protected by law. I could try to put
together a self-contained example, but I'm not sure the best way to do
that, since the table sizes and value distributions may be significant
here. Any thoughts on that?

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeffrey W. Baker 2006-02-01 20:28:19 Re: Index Usage using IN
Previous Message Jeffrey W. Baker 2006-02-01 20:22:50 Re: Index Usage using IN