Re: Weird, bad 0.5% selectivity estimate for a column equal to itself

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Weird, bad 0.5% selectivity estimate for a column equal to itself
Date: 2013-06-26 01:41:47
Message-ID: 22653.1372210907@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Personally, I'll bet lunch that that external software is outright
>> broken, ie it probably thinks "X = X" is constant true and they found
>> they could save two lines of code and a few machine cycles by emitting
>> that rather than not emitting anything.

> Well, it was more in the form of:
> tab1.x = COALESCE(tab2.y,tab1.x)

Hm. I'm not following how you get from there to complaining about not
being smart about X = X, because that surely ain't the same.

> Well, I'd be more satisfied with having a solution for:
> WHERE tab1.x = tab1.y
> ... in general, even if it didn't have correlation stats. Like, what's
> preventing us from using the same selectivity logic we would on a join
> for that?

It's a totally different case. In the join case you expect that each
element of one table will be compared with each element of the other.
In the single-table case, that's exactly *not* what will happen, and
I don't see how you get to anything very useful without knowing
something about the value pairs that actually occur. As a concrete
example, applying the join selectivity logic would certainly give a
completely wrong answer for X = X, unless there were only one value
occurring in the column.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Willy-Bas Loos 2013-06-26 15:45:15 seqscan for 100 out of 3M rows, index present
Previous Message Ben 2013-06-26 00:29:01 Re: incorrect row estimates for primary key join