Re: Planner reluctant to start from subquery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner reluctant to start from subquery
Date: 2006-02-01 20:36:15
Message-ID: 4359.1138826175@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm interested to poke at this ... are you in a position to provide a
>> test case?

> I can't supply the original data, since many of the tables have
> millions of rows, with some of the data (related to juvenile, paternity,
> sealed, and expunged cases) protected by law. I could try to put
> together a self-contained example, but I'm not sure the best way to do
> that, since the table sizes and value distributions may be significant
> here. Any thoughts on that?

I think that the only aspect of the data that really matters here is the
number of distinct values, which would affect decisions about whether
HashAggregate is appropriate or not. And you could probably get the
same thing to happen with at most a few tens of thousands of rows.

Also, all we need to worry about is the columns used in the WHERE/JOIN
conditions, which looks to be mostly case numbers, dates, and county
identification ... how much confidential info is there in that? At
worst you could translate the case numbers to some randomly generated
identifiers.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-02-01 20:41:09 Re: Index Usage using IN
Previous Message Jeffrey W. Baker 2006-02-01 20:28:19 Re: Index Usage using IN