Re: Planner reluctant to start from subquery

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Planner reluctant to start from subquery
Date: 2006-02-01 20:43:01
Message-ID: 43E0C8F5.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>> On Wed, Feb 1, 2006 at 2:36 pm, in message
<4359(dot)1138826175(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm interested to poke at this ... are you in a position to provide
a
>>> test case?
>
>> I can't supply the original data, since many of the tables have
>> millions of rows, with some of the data (related to juvenile,
paternity,
>> sealed, and expunged cases) protected by law. I could try to put
>> together a self- contained example, but I'm not sure the best way to
do
>> that, since the table sizes and value distributions may be
significant
>> here. Any thoughts on that?
>
> I think that the only aspect of the data that really matters here is
the
> number of distinct values, which would affect decisions about
whether
> HashAggregate is appropriate or not. And you could probably get the
> same thing to happen with at most a few tens of thousands of rows.
>
> Also, all we need to worry about is the columns used in the
WHERE/JOIN
> conditions, which looks to be mostly case numbers, dates, and county
> identification ... how much confidential info is there in that? At
> worst you could translate the case numbers to some randomly
generated
> identifiers.

OK, I could probably obliterate name, addresses, etc. in a copy of the
data (those aren't significant to the query anyway) and provide a test
case. However, I just found another clue.

Since you were so confident it couldn't be the outer join, I went
looking for what else I changed at the same time. I eliminated the code
referencing that table, which contained an OR. I've seen ORs cause
nasty problems with optimizers in the past. I took out the OR in the
where clause, without eliminating that last outer join, and it optimized
fine.

I'll hold off a bit to see if you still need the test case. ;-)

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2006-02-01 20:50:51 Re: Planner reluctant to start from subquery
Previous Message Tom Lane 2006-02-01 20:41:09 Re: Index Usage using IN