Re: index v. seqscan for certain values

From: "Jeremy Dunn" <jdunn(at)autorevenue(dot)com>
To: "'Postgresql Performance'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: index v. seqscan for certain values
Date: 2004-04-12 17:08:05
Message-ID: 001801c420b0$b8b79b30$4f01a8c0@jeremydunn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry I should have written that we do VACUUM VERBOSE ANALYZE every
night.

- Jeremy

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Bill Moran
Sent: Monday, April 12, 2004 12:09 PM
To: jdunn(at)autorevenue(dot)com
Cc: Postgresql Performance
Subject: Re: [PERFORM] index v. seqscan for certain values

Quick bit of input, since you didn't mention it.

How often do you run ANALYZE? I found it interesting that a database I
was doing tests on sped up by a factor of 20 after ANALYZE. If your
data changes a lot, you should probably schedule ANALYZE to run with
VACUUM.

Jeremy Dunn wrote:
> I've searched the archives and can't find an answer to this seemingly
> simple question. Apologies if it's too common.
>
> The table in question has ~1.3M rows. It has 85 columns, 5 of which
> have single-column indexes.
>
> The column in question (CID) has 183 distinct values. For these
> values,
> the largest has ~38,000 rows, and the smallest has 1 row. About 30
> values have < 100 rows, and about 10 values have > 20,000 rows.
>
> The database is 7.2.3 running on RedHat 7.1. (we are in process of
> upgrading to PG 7.4.2) All of the query plan options are enabled,
and
> the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01,
> cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night.
>
> The problem:
> A simply query:
> select count(*) from xxx where CID=<smalval>
> where <smalval> is a CID value which has relatively few rows, returns
> a
> plan using the index on that column.
>
> explain analyze select count(*) from xxx where cid=869366;
> Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual
> time=78.49..78.49 rows=1 loops=1)
> -> Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21
> rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1)
> Total runtime: 78.69 msec
>
> The same plan is true for values which have up to about 20,000 rows:
>
> explain analyze select count(*) from xxx where cid=6223341;
> Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual
> time=11614.89..11614.89 rows=1 loops=1)
> -> Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26
> rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
> Total runtime: 11615.05 msec
> However for the values that have > 20,000 rows, the plan changes to a
> sequential scan, which is proportionately much slower.
>
> explain analyze select count(*) from xxx where cid=7191032;
> Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual
> time=46427.81..46427.82 rows=1 loops=1)
> -> Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0)
> (actual time=9104.45..46370.27 rows=37765 loops=1)
> Total runtime: 46428.00 msec
>
>
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values? In terms of elapsed time it is more
> than twice as slow, proportionate to an index scan for the same number

> of rows.
>
> What I tried:
>
> A) alter table xxx alter column cid set statistics 500;
> analyze xxx;
> This does not affect the results.
>
> B) dropped/rebuilt the index, with no improvement.
>
> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success
>
> D) force an index scan for the larger values by using a very high
> value
> for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing
to do.
>
> Your thoughts appreciated in advance!
>
> - Jeremy
>
> 7+ years experience in Oracle performance-tuning
> relatively new to postgresql

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2004-04-12 17:39:51 Re: index v. seqscan for certain values
Previous Message Bill Moran 2004-04-12 16:09:15 Re: index v. seqscan for certain values