Re: Forcing query to use an index

From: Michael Nachbaur <mike(at)nachbaur(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Forcing query to use an index
Date: 2003-03-03 22:46:52
Message-ID: 06B62341-4DCA-11D7-90E0-000A27935D5A@nachbaur.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Monday, Mar 3, 2003, at 14:21 US/Pacific, Greg Stark wrote:
> You should send the query as well, and \d customer_month_summary so we
> can see
> how you defined your indexes.

Table "customer_month_summary"
Column | Type | Modifiers
---------------+--------------------------+-----------
customerid | integer |
timestart | timestamp with time zone |
timeend | timestamp with time zone |
accountnum | character varying(255) |
firstname | character varying(255) |
lastname | character varying(255) |
organization | character varying(255) |
package | character varying(255) |
up | bigint |
down | bigint |
maxup | bigint |
maxdown | bigint |
violatedsize | bigint |
maxtotal | bigint |
total | bigint |
violatedup | bigint |
violateddown | bigint |
violatedtotal | bigint |
monthstart | date |
Indexes: customer_month_summary_cid_idx,
customer_month_summary_cm_idx,
customer_month_summary_time_idx

> There doesn't seem to be a filter on the scan so it looks like
> postgres thinks
> you're actually reading in the entire table, which is normally faster
> with a
> sequential scan than an index scan. In fact I'm surprised it's doing
> an index
> scan on the other table and not a sequential scan.
>
> Some things to try:
>
> set enable_seqscan = off
>
> Then try your query again, see if postgres is right and it really is
> faster to
> do the sequential scan.

Sort (cost=100014872.07..100014872.07 rows=268 width=265) (actual
time=382.51..382.51 rows=6 loops=1)
-> Aggregate (cost=100014693.83..100014861.27 rows=268 width=265)
(actual time=381.93..382.33 rows=6 loops=1)
-> Group (cost=100014693.83..100014854.57 rows=2679
width=265) (actual time=381.79..381.90 rows=6 loops=1)
-> Sort (cost=100014693.83..100014693.83 rows=2679
width=265) (actual time=381.75..381.75 rows=6 loops=1)
-> Merge Join (cost=100000005.18..100014541.30
rows=2679 width=265) (actual time=38.21..381.13 rows=6 loops=1)
-> Nested Loop (cost=0.00..14525.77
rows=2679 width=247) (actual time=0.14..149.21 rows=2679 loops=1)
-> Index Scan using customer_id_key on
customer c (cost=0.00..129.73 rows=2679 width=156) (actual
time=0.06..36.92 rows=2679 loops=1)
-> Index Scan using
customer_month_summary_cm_idx on customer_month_summary cms
(cost=0.00..5.36 rows=1 width=91) (actual time=0.02..0.03 rows=1
loops=2679)
-> Sort (cost=100000005.18..100000005.18
rows=77 width=18) (actual time=0.44..0.56 rows=77 loops=1)
-> Seq Scan on emailaddress ea
(cost=100000000.00..100000002.77 rows=77 width=18) (actual
time=0.01..0.24 rows=77 loops=1)
Total runtime: 383.25 msec

So from the looks of things the index is way faster (querying 1 row,
rather than over 8000).

> set random_page_cost = 2
>
> Or even lower values.

Sort (cost=6655.12..6655.12 rows=268 width=265) (actual
time=902.75..902.76 rows=6 loops=1)
-> Aggregate (cost=6476.88..6644.32 rows=268 width=265) (actual
time=902.18..902.58 rows=6 loops=1)
-> Group (cost=6476.88..6637.62 rows=2679 width=265) (actual
time=902.01..902.13 rows=6 loops=1)
-> Sort (cost=6476.88..6476.88 rows=2679 width=265)
(actual time=901.97..901.98 rows=6 loops=1)
-> Merge Join (cost=6106.42..6324.34 rows=2679
width=265) (actual time=585.73..901.35 rows=6 loops=1)
-> Merge Join (cost=6101.24..6308.82
rows=2679 width=247) (actual time=549.53..667.35 rows=2679 loops=1)
-> Index Scan using customer_id_key on
customer c (cost=0.00..118.77 rows=2679 width=156) (actual
time=0.25..42.08 rows=2679 loops=1)
-> Sort (cost=6101.24..6101.24
rows=8117 width=91) (actual time=549.21..555.19 rows=8117 loops=1)
-> Seq Scan on
customer_month_summary cms (cost=0.00..5574.17 rows=8117 width=91)
(actual time=252.90..472.29 rows=8117 loops=1)
-> Sort (cost=5.18..5.18 rows=77 width=18)
(actual time=0.70..0.81 rows=77 loops=1)
-> Seq Scan on emailaddress ea
(cost=0.00..2.77 rows=77 width=18) (actual time=0.08..0.35 rows=77
loops=1)
Total runtime: 905.47 msec

This is with enable_seqscan turned off, so it doesn't look like it's
doing much good.

> I've also had some success with raising cpu_tuple_cost, though I'm
> unclear on
> whether that's actually a good approach or not.
>
> Also, don't forget to do a vacuum full on these tables before doing
> testing for optimizations at this level. You can get some confusing
> results if
> your tables have lots of empty holes in them.

IIRC this locks the tables, so I'll have to run this tonight so the
users of this system don't race into my office to tar and feather me.

Does using "set enable_seqscan = off" impact anything else on the
system, or is it connection-specific? Is this the recommended way of
doing this, or is there another way of coercing Postgres into using
indexes? I'm making these queries from Perl code running in an XSP
page, so I'm not sure how flexible this option would be from a
developer point of view.

Thanks for your help.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Josh Berkus 2003-03-03 23:21:48 Re: Forcing query to use an index
Previous Message Michael Nachbaur 2003-03-03 22:32:45 Re: Forcing query to use an index