Quick Links

Re: Weird, bad 0.5% selectivity estimate for a column equal to itself

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Weird, bad 0.5% selectivity estimate for a column equal to itself
Date:	2013-06-26 01:41:47
Message-ID:	22653.1372210907@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Personally, I'll bet lunch that that external software is outright
>> broken, ie it probably thinks "X = X" is constant true and they found
>> they could save two lines of code and a few machine cycles by emitting
>> that rather than not emitting anything.

> Well, it was more in the form of:
> tab1.x = COALESCE(tab2.y,tab1.x)

Hm. I'm not following how you get from there to complaining about not
being smart about X = X, because that surely ain't the same.

> Well, I'd be more satisfied with having a solution for:
> WHERE tab1.x = tab1.y
> ... in general, even if it didn't have correlation stats. Like, what's
> preventing us from using the same selectivity logic we would on a join
> for that?

It's a totally different case. In the join case you expect that each
element of one table will be compared with each element of the other.
In the single-table case, that's exactly *not* what will happen, and
I don't see how you get to anything very useful without knowing
something about the value pairs that actually occur. As a concrete
example, applying the join selectivity logic would certainly give a
completely wrong answer for X = X, unless there were only one value
occurring in the column.

regards, tom lane

In response to

Re: Weird, bad 0.5% selectivity estimate for a column equal to itself at 2013-06-25 23:10:40 from Josh Berkus

Responses

Re: Weird, bad 0.5% selectivity estimate for a column equal to itself at 2013-06-26 23:04:02 from Josh Berkus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Willy-Bas Loos	2013-06-26 15:45:15	seqscan for 100 out of 3M rows, index present
Previous Message	Ben	2013-06-26 00:29:01	Re: incorrect row estimates for primary key join