From: | Kelly Burkhart <kelly(at)tradebotsystems(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | optimizer tuning/forcing correct index use |
Date: | 2002-03-19 13:22:49 |
Message-ID: | 02031907224907.00735@krbdev |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Greetings,
I have a query that sometimes refuses to use a sensible index.
Some info first:
Version: PostgreSQL v 7.2
OS: SuSE Linux 7.2 w/Kernel 2.4.14
HW: Dual Athlon MP1800, 768MB RAM
DB exists on a single fast SCSI drive.
DB was loaded with a copy of our production system data (currently
running on a SQLServer database). 'vacuumdb -a -z' was run after
load, no rows have been added or deleted since vacuum.
Here is the query:
explain analyze
select
t.tb_order_number, f.account_number, f.clearing_account_id,
f.symbol_full_name, f.fill_quantity, f.buy_or_sell,
f.trader_username, f.fill_price, f.fill_quantity, f.fill_ts,
f.last_update_ts, s.symbol_type, s.base_symbol_full_name,
s.symbol_name, ca.clearing_account_number, ca.clearing_firm,
co.tick_value
from
fill f,
tb_order t,
symbol s,
clearing_account ca,
contract co
where
f.fill_ts > '2002-02-06 00:00:00' and
f.fill_ts <= '2002-02-08 23:59:59' and
t.order_id = f.order_id and
s.full_name = f.symbol_full_name and
co.contract_name = s.contract_name and
ca.clearing_account_id = f.clearing_account_id and
s.symbol_name not in ('ISLD:TESTA', 'ISLD:TESTB', 'ISLD:TESTC');
Total number of rows in the tables of the join:
fill : 7674725
tb_order : 20059204
symbol : 559
clearing_account : 57
contract : 13
When the time range (defined by fill_ts) is under 48 hours, the
following execution plan is used:
Nested Loop (cost=35.98..477477.54 rows=37403 width=170) (actual time=6.38..7868.65 rows=143582 loops=1)
-> Hash Join (cost=35.98..256958.14 rows=37403 width=152) (actual time=6.21..3937.04 rows=143582 loops=1)
-> Hash Join (cost=1.71..256229.53 rows=45364 width=90) (actual time=0.50..2685.68 rows=143582 loops=1)
-> Index Scan using fill_ak2 on fill f (cost=0.00..255314.58 rows=69239 width=66) (actual time=0.26..1875.38 rows=143582 loops=1)
-> Hash (cost=1.57..1.57 rows=57 width=24) (actual time=0.20..0.20 rows=0 loops=1)
-> Seq Scan on clearing_account ca (cost=0.00..1.57 rows=57 width=24) (actual time=0.05..0.14 rows=57 loops=1)
-> Hash (cost=32.89..32.89 rows=547 width=62) (actual time=5.66..5.66 rows=0 loops=1)
-> Hash Join (cost=1.16..32.89 rows=547 width=62) (actual time=0.21..4.73 rows=559 loops=1)
-> Seq Scan on symbol s (cost=0.00..20.78 rows=547 width=48) (actual time=0.05..2.43 rows=559 loops=1)
-> Hash (cost=1.13..1.13 rows=13 width=14) (actual time=0.11..0.11 rows=0 loops=1)
-> Seq Scan on contract co (cost=0.00..1.13 rows=13 width=14) (actual time=0.06..0.08 rows=13 loops=1)
-> Index Scan using order_pk on tb_order t (cost=0.00..5.88 rows=1 width=18) (actual time=0.02..0.02 rows=1 loops=143582)
Total runtime: 7934.64 msec
Somewhere above 48 hours the plan changes to this:
Nested Loop (cost=35.98..644155.97 rows=56104 width=170) (actual time=48504.60..128512.95 rows=196483 loops=1)
-> Hash Join (cost=35.98..313376.24 rows=56104 width=152) (actual time=48450.25..121546.48 rows=196483 loops=1)
-> Hash Join (cost=1.71..312300.45 rows=68045 width=90) (actual time=48444.62..119822.68 rows=196483 loops=1)
-> Seq Scan on fill f (cost=0.00..310928.88 rows=103859 width=66) (actual time=48444.38..118737.56 rows=196483 loops=1)
-> Hash (cost=1.57..1.57 rows=57 width=24) (actual time=0.19..0.19 rows=0 loops=1)
-> Seq Scan on clearing_account ca (cost=0.00..1.57 rows=57 width=24) (actual time=0.05..0.13 rows=57 loops=1)
-> Hash (cost=32.89..32.89 rows=547 width=62) (actual time=5.57..5.57 rows=0 loops=1)
-> Hash Join (cost=1.16..32.89 rows=547 width=62) (actual time=0.21..4.64 rows=559 loops=1)
-> Seq Scan on symbol s (cost=0.00..20.78 rows=547 width=48) (actual time=0.05..2.34 rows=559 loops=1)
-> Hash (cost=1.13..1.13 rows=13 width=14) (actual time=0.11..0.11 rows=0 loops=1)
-> Seq Scan on contract co (cost=0.00..1.13 rows=13 width=14) (actual time=0.07..0.09 rows=13 loops=1)
-> Index Scan using order_pk on tb_order t (cost=0.00..5.88 rows=1 width=18) (actual time=0.03..0.03 rows=1 loops=196483)
Total runtime: 128598.23 msec
The critical part is driving the fill table query from a sequential scan
rather than a scan of fill_ak2.
I believe I understand why the optimizer changes the plan: it thinks
that the larger time range will make the fill_ak2 index scan a more
expensive option than a sequential scan of the fill table. Can
someone recommend a way for me to show the optimizer the error of its
ways? (short of using 'set enable_seqscan = no', that is).
Incidentally, in our current SQL server database I use a hint to tell
the optimizer specifically what index to use. When using Oracle
databases in the past, I've had to do the same thing. I understand
the PostgreSQL developers are against this approach, preferring to
make the optimizer smarter instead. I agree with this in principal,
however, it seems to me that until the optimizer is perfect, this type
of feature is needed. Comments?
-K
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2002-03-19 14:25:37 | Re: create function problem |
Previous Message | Kancha . | 2002-03-19 10:03:43 | Re: create function problem |