Re: incorrect row estimates for primary key join

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Ben <midfield(at)gmail(dot)com>, bricklen <bricklen(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org list" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: incorrect row estimates for primary key join
Date: 2013-06-25 23:36:17
Message-ID: 19479.1372203377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Ben <midfield(at)gmail(dot)com> wrote:
>> it seems to me that an equality join between two relations (call them A and B)
>> using columns in relation B with a unique constraint should yield row estimates
>> which are at most equal to the row estimates for relation A. my questions are
>>
>> 1 - is this correct?
>>
>> 2 - does the postgresql planner implement this when generating row estimates?

> That seems intuitive, but some of the estimates need to be made
> before all such information is available. Maybe we can do
> something about that some day....
> Maybe someone else will jump in here with more details than I can
> provide (at least without hours digging in the source code).

It does not attempt to match up query WHERE clauses with indexes during
selectivity estimation, so the existence of a multi-column unique
constraint wouldn't help it improve the estimate.

In the case at hand, I doubt that a better result rowcount estimate
would have changed the planner's opinion of how to do the join. The OP
seems to be imagining that 2 million index probes into a large table
would be cheap, but that's hardly true. It's quite likely that the
mergejoin actually is the best way to do the query. If it isn't really
best on his hardware, I would think that indicates a need for some
tuning of the cost parameters. Another thing that might be helpful for
working with such large tables is increasing work_mem, to make hashes
and sorts run faster.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ben 2013-06-26 00:29:01 Re: incorrect row estimates for primary key join
Previous Message Josh Berkus 2013-06-25 23:10:40 Re: Weird, bad 0.5% selectivity estimate for a column equal to itself