Re: index / sequential scan problem

From: Fabian Kreitner <fabian(dot)kreitner(at)ainea-ag(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: index / sequential scan problem
Date: 2003-07-18 04:47:05
Message-ID: 5.1.0.14.0.20030718063650.0397b238@195.145.148.245
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At 20:12 17.07.2003, Tom Lane wrote:
>Fabian Kreitner <fabian(dot)kreitner(at)ainea-ag(dot)de> writes:
> > That is what I read too and is why Im confused that the index is indeed
> > executing faster. Can this be a problem with the hardware and/or postgress
> > installation?
>
>I think the actual issue here is that you are executing the EXISTS
>subplan over and over, once for each outer row. The planner's cost
>estimate for EXISTS is based on the assumption that you do it once
>... in which scenario the seqscan very possibly is cheaper. However,
>when you do the EXISTS subplan over and over for many outer rows, you
>get a savings from the fact that the index and table pages soon get
>cached in memory. The seqscan plan gets a savings too, since the table
>is small enough to fit in memory, but once everything is in memory the
>indexscan plan is faster.
>
>There's been some discussion on pghackers about how to teach the planner
>to account for repeated executions of subplans, but we have not come up
>with a good solution yet.
>
>For the moment, what people tend to do if they know their database is
>small enough to mostly stay in memory is to reduce random_page_cost to
>make the planner favor indexscans. If you know the database is entirely
>cached then the theoretically correct value of random_page_cost is 1.0
>(since fetching any page will cost the same, if it's all in RAM). I'd
>recommend against adopting that as a default, but a lot of people find
>that setting it to 2.0 or so seems to model their situation better than
>the out-of-the-box 4.0.

Thanks for the explanation :)

However .... :(

perg_1097=# vacuum analyze;
VACUUM
perg_1097=# set random_page_cost to 1.0;
SET VARIABLE
perg_1097=# EXPLAIN ANALYZE select notiz_id, obj_id, obj_typ
perg_1097-# from notiz_objekt a
perg_1097-# where not exists
perg_1097-# (
perg_1097(# select 1
perg_1097(# from notiz_gelesen b
perg_1097(# where ma_id = 2001
perg_1097(# and ma_pid = 1097
perg_1097(# and a.notiz_id = b.notiz_id
perg_1097(# )
perg_1097-# ;
NOTICE: QUERY PLAN:

Seq Scan on notiz_objekt a (cost=0.00..56125.80 rows=15561 width=12)
(actual time=0.27..2299.09 rows=31122 loops=1)
SubPlan
-> Seq Scan on notiz_gelesen b (cost=0.00..1.79 rows=1 width=0)
(actual time=0.07..0.07 rows=0 loops=31122)
Total runtime: 2328.05 msec

EXPLAIN
perg_1097=#

...

perg_1097=# set enable_seqscan to false;
SET VARIABLE
perg_1097=# set random_page_cost to 1.0;
SET VARIABLE
perg_1097=# EXPLAIN ANALYZE select notiz_id, obj_id, obj_typ
perg_1097-# from notiz_objekt a
perg_1097-# where not exists
perg_1097-# (
perg_1097(# select 1
perg_1097(# from notiz_gelesen b
perg_1097(# where ma_id = 2001
perg_1097(# and ma_pid = 1097
perg_1097(# and a.notiz_id = b.notiz_id
perg_1097(# )
perg_1097-# ;
NOTICE: QUERY PLAN:

Seq Scan on notiz_objekt a (cost=100000000.00..100093380.36 rows=15561
width=12) (actual time=0.07..550.07 rows=31122 loops=1)
SubPlan
-> Index Scan using idx_notiz_gelesen_2 on notiz_gelesen
b (cost=0.00..2.98 rows=1 width=0) (actual time=0.01..0.01 rows=0 loops=31122)
Total runtime: 582.90 msec

EXPLAIN
perg_1097=#

Even with a random page cost of 1 it thinks using the index should/could
take significantly longer which it doesnt for some reason :-/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Fabian Kreitner 2003-07-18 05:18:27 Re: index / sequential scan problem
Previous Message Robert Creager 2003-07-18 03:23:08 Re: Hardware performance