Quick Links

Re: Odd problem with planner choosing seq scan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Colin McGuigan <cmcguigan(at)earthcomber(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Odd problem with planner choosing seq scan
Date:	2007-04-21 17:06:37
Message-ID:	25028.1177175197@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Colin McGuigan <cmcguigan(at)earthcomber(dot)com> writes:
> -> Subquery Scan s (cost=0.00..21.93 rows=1 width=8)
> Filter: ((userid = 123456) AND (locationid IS NULL))
> -> Limit (cost=0.00..15.30 rows=530 width=102)
> -> Seq Scan on staff (cost=0.00..15.30 rows=530 width=102)

There does seem to be a bug here, but not the one you think: the rows=1
estimate for the subquery node seems a bit silly given that it knows
there are 530 rows in the underlying query. I'm not sure how bright the
code is about finding stats for variables emitted by a subquery, but
even with totally default estimates it should not come up with a
selectivity of 1/500 for the filter. Unfortunately, fixing that is
likely to bias it further away from the plan you want ...

> Furthermore, I can repeat this experiment over and over, so I know that
> its not caching.

You mean it *is* caching.

> I'd really prefer this query run in < 1 second rather than > 45, but I'd
> really like to do that without having hacks like adding in pointless
> LIMIT clauses.

The right way to do it is to adjust the planner cost parameters.
The standard values of those are set on the assumption of
tables-much-bigger-than-memory, a situation in which the planner's
preferred plan probably would be the best. What you are testing here
is most likely a situation in which the whole of both tables fits in
RAM. If that pretty much describes your production situation too,
then you should decrease seq_page_cost and random_page_cost. I find
setting them both to 0.1 produces estimates that are more nearly in
line with true costs for all-in-RAM situations.

(Pre-8.2, there's no seq_page_cost, so instead set random_page_cost
to 1 and inflate all the cpu_xxx cost constants by 10.)

regards, tom lane

In response to

Odd problem with planner choosing seq scan at 2007-04-21 15:33:38 from Colin McGuigan

Responses

Re: Odd problem with planner choosing seq scan at 2007-04-22 01:48:22 from Colin McGuigan

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeroen Kleijer	2007-04-21 20:17:42	not using indexes on large table
Previous Message	Stephan Szabo	2007-04-21 16:18:16	Re: FK triggers misused?