| From: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> |
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | <pgsql-performance(at)postgresql(dot)org> |
| Subject: | Re: Planner reluctant to start from subquery |
| Date: | 2006-02-01 20:43:01 |
| Message-ID: | 43E0C8F5.EE98.0025.0@wicourts.gov |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
>>> On Wed, Feb 1, 2006 at 2:36 pm, in message
<4359(dot)1138826175(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm interested to poke at this ... are you in a position to provide
a
>>> test case?
>
>> I can't supply the original data, since many of the tables have
>> millions of rows, with some of the data (related to juvenile,
paternity,
>> sealed, and expunged cases) protected by law. I could try to put
>> together a self- contained example, but I'm not sure the best way to
do
>> that, since the table sizes and value distributions may be
significant
>> here. Any thoughts on that?
>
> I think that the only aspect of the data that really matters here is
the
> number of distinct values, which would affect decisions about
whether
> HashAggregate is appropriate or not. And you could probably get the
> same thing to happen with at most a few tens of thousands of rows.
>
> Also, all we need to worry about is the columns used in the
WHERE/JOIN
> conditions, which looks to be mostly case numbers, dates, and county
> identification ... how much confidential info is there in that? At
> worst you could translate the case numbers to some randomly
generated
> identifiers.
OK, I could probably obliterate name, addresses, etc. in a copy of the
data (those aren't significant to the query anyway) and provide a test
case. However, I just found another clue.
Since you were so confident it couldn't be the outer join, I went
looking for what else I changed at the same time. I eliminated the code
referencing that table, which contained an OR. I've seen ORs cause
nasty problems with optimizers in the past. I took out the OR in the
where clause, without eliminating that last outer join, and it optimized
fine.
I'll hold off a bit to see if you still need the test case. ;-)
-Kevin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kevin Grittner | 2006-02-01 20:50:51 | Re: Planner reluctant to start from subquery |
| Previous Message | Tom Lane | 2006-02-01 20:41:09 | Re: Index Usage using IN |