Re: Planner chooses multi-column index in 9.2 when maybe it should not

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Planner chooses multi-column index in 9.2 when maybe it should not
Date: 2012-10-11 01:24:32
Message-ID: 15816.1349918672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Sabino Mullane <greg(at)endpoint(dot)com> writes:
> -> Bitmap Heap Scan on foobar o (C=30389..835271 R=8980 W=8) (AT=0.06..0.07 R=1 L=1)
> Recheck Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))
> Filter: (((status) <> ALL ('{panda,penguin}'[])) \
> AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$')))
> -> Bitmap Index Scan on foobar_pkey (C=0..30386 R=1888670 W=0) (AT=0.02..0.02 R=1 L=1)
> Index Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))

Actually, looking closer, I think the problem is not with the estimation
of the index-only scan on the other index; the planner is estimating
that as pretty expensive, which it is. The problem is that it thinks
the above bitmap scan is pretty expensive, when it isn't. And the
reason evidently is that it's totally off in the weeds about the
selectivity of the range condition on foobar.id. Anytime you've got
1888670 estimated rows and 1 actual row, you've got a problem.

This is related to the problem I was on about a couple weeks ago:
http://archives.postgresql.org/message-id/17655.1348874742@sss.pgh.pa.us
namely that the planner fails to recognize pairs of clauses as a range
constraint if they're join clauses. If it had recognized that, you'd
have gotten an estimate that would still be far more than "1 row", but
would be more than an order of magnitude less than this one, which would
be enough to fix this problem.

I'd not been thinking of that change as something we'd risk
back-patching, but maybe we should consider putting it into 9.2. It
seems like the index-only scan support has put a new level of premium on
the quality of the planner's rowcount estimates.

Meanwhile, that range condition in itself looks a tad, er, klugy.
Do you really need that, or is this a crummy way of stating
foobar.id = m.id?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-10-11 02:09:53 Re: Can two “SELECT FOR UPDATE” statements on the same table cause a deadlock?
Previous Message Gavin Flower 2012-10-11 00:47:14 Re: Index only scan