From: | "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: planner/optimizer question |
Date: | 2004-04-29 22:31:00 |
Message-ID: | 40919034.20317.153A0B6F@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 29 Apr 2004 at 17:54, Tom Lane wrote:
> "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk> writes:
> > In this example the statistics don't matter.
>
> Don't they?
>
> A prior poster mentioned that he thought MSSQL tries to keep all its
> indexes in memory. I wonder whether you are giving Postgres a fair
> chance to do the same. What postgresql.conf settings are you using?
>
> regards, tom lane
As far as I understand it the statistics only contribute to determining the query plan.
Once the access methods are determined, the stats don't matter during the running of
the query.
I believe I have given Postgres exactly the same chance. The data is small enough to fit
into RAM (all the tables in the query add up to around 50meg) and I executed the query
several times to get a consistent figure for the explain analyze.
Having picked out an index scan as being the highest time user I concentrated on that in
this case and compared the same index scan on MSSQL. At least MSSQL reported it as
an index scan on the same index for the same number of rows.
There was nothing wrong with the query plan that Postgres used. As far as I could see it
was probably the best one to use, it just physically took longer than the same access
plan on MSSQL.
The query and plan are included below, the main thing I was looking at was the index
scan on staff_booking_pkey being 676ms long.
The only postgresql.conf parameters changed from the default are:
shared_buffers = 3000
sort_mem = 4096
effective_cache_size = 15000
default_statistics_target = 100
There was no disk IO (above the small background IO) during the final run of the query
as reported by vmstat (Task Mangler on Windows).
SELECT B.CONTRACT_ID,SUM(R.DURATION+1)/60.0 AS SUMDUR FROM
SEARCH_REQT_RESULT TSR
JOIN STAFF_BOOKING B ON (B.STAFF_ID = TSR.STAFF_ID)
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND
BP.BOOKING_DATE BETWEEN '2004-04-12' AND '2004-04-18' AND
TSR.SEARCH_ID = 8 GROUP BY B.CONTRACT_ID
QUERY PLAN
HashAggregate (cost=11205.80..11209.81 rows=401 width=6) (actual
time=1179.729..1179.980 rows=50 loops=1)
-> Nested Loop (cost=326.47..11203.79 rows=401 width=6) (actual
time=39.700..1177.149 rows=652 loops=1)
-> Hash Join (cost=326.47..9990.37 rows=401 width=8) (actual
time=39.537..1154.807 rows=652 loops=1)
Hash Cond: ("outer".staff_id = "inner".staff_id)
-> Merge Join (cost=320.39..9885.06 rows=3809 width=12) (actual
time=38.316..1143.953 rows=4079 loops=1)
Merge Cond: ("outer".booking_id = "inner".booking_id)
-> Index Scan using staff_booking_pkey on staff_booking b
(cost=0.00..8951.94 rows=222612 width=16) (actual time=0.218..676.219 rows=222609
loops=1)
-> Sort (cost=320.39..329.91 rows=3808 width=4) (actual
time=26.225..32.754 rows=4079 loops=1)
Sort Key: bp.booking_id
-> Index Scan using booking_plan_idx2 on booking_plan bp
(cost=0.00..93.92 rows=3808 width=4) (actual time=0.223..14.186 rows=4079 loops=1)
Index Cond: ((booking_date >= '2004-04-12'::date) AND
(booking_date <= '2004-04-18'::date))
-> Hash (cost=5.59..5.59 rows=193 width=4) (actual time=1.139..1.139
rows=0 loops=1)
-> Index Scan using fk_idx_search_reqt_result on search_reqt_result tsr
(cost=0.00..5.59 rows=193 width=4) (actual time=0.213..0.764 rows=192 loops=1)
Index Cond: (search_id = 8)
-> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..3.01 rows=1
width=6) (actual time=0.023..0.025 rows=1 loops=652)
Index Cond: (r.reqt_id = "outer".reqt_id)
Total runtime: 1181.239 ms
Cheers,
Gary.
From | Date | Subject | |
---|---|---|---|
Next Message | Gary Doades | 2004-04-29 23:01:10 | Re: planner/optimizer question |
Previous Message | Josh Berkus | 2004-04-29 22:10:08 | Re: Simply join in PostrgeSQL takes too long |