Re: index v. seqscan for certain values

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jdunn(at)autorevenue(dot)com
Cc: "Postgresql Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: index v. seqscan for certain values
Date: 2004-04-12 17:51:28
Message-ID: 12037.1081792288@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jeremy Dunn" <jdunn(at)autorevenue(dot)com> writes:
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?

At some point a seqscan *will* be better. In the limit, if the key
being sought is common enough to occur on every page of the table,
it's certain that a seqscan will require less I/O than an indexscan
(because reading the index isn't actually saving you any heap fetches).
In practice the breakeven point is less than that because Unix kernels
are better at handling sequential than random access.

Your gripe appears to be basically that the planner's idea of the
breakeven point is off a bit. It looks to me like it's within about
a factor of 2 of being right, though, which is not all that bad when
it's using generic cost parameters.

> A) alter table xxx alter column cid set statistics 500;
> analyze xxx;
> This does not affect the results.

It probably improved the accuracy of the row count estimates, no?
The estimate you show for cid=7191032 is off by more than 25% (37765 vs
50792), which seems like a lot of error for one of the most common
values in the table. (I hope that was with default stats target and
not 500.) That leads directly to a 25% overestimate of the cost of
an indexscan, while having IIRC no impact on the cost of a seqscan.
Since the cost ratio was more than 25%, this didn't change the selected
plan, but you want to fix that error as best you can before you move
on to tweaking cost parameters.

> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success

Wrong thing. You should be tweaking random_page_cost. Looks to me like
a value near 2 might be appropriate for your setup. Also it is likely
appropriate to increase effective_cache_size, which is awfully small in
the default configuration. I'd set that to something related to your
available RAM before trying to home in on a suitable random_page_cost.

AFAIK hardly anyone bothers with changing the cpu_xxx costs ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pailloncy Jean-Gérard 2004-04-12 19:02:02 Re: Index Backward Scan fast / Index Scan slow !
Previous Message Stephan Szabo 2004-04-12 17:39:51 Re: index v. seqscan for certain values