From: | anthony(dot)shipman(at)symstream(dot)com |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Cc: | tv(at)fuzzy(dot)cz |
Subject: | Re: strange query plan with LIMIT |
Date: | 2011-06-08 05:08:05 |
Message-ID: | 201106081508.05668.anthony.shipman@symstream.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wednesday 08 June 2011 02:40, tv(at)fuzzy(dot)cz wrote:
> Hi,
>
> why are you reposting this? Pavel Stehule already recommended you to run
> ANALYZE on the tdiag table - have you done that? What was the effect?
The mailing list system hiccupped and I ended up with two posts.
VACUUM ANALYZE was done, more than once.
Setting the statistics value on the diag_id column to 1000 seemed to only make
the query a bit slower.
>
> The stats are off - e.g. the bitmap scan says
>
> -> Bitmap Heap Scan on tdiag (cost=25763.48..638085.13 rows=1141019
> width=114) (actual time=43.232..322.441 rows=86530 loops=1)
>
> so it expects to get 1141019 rows but it gets 86530, i.e. about 7% of the
> expected number. That might be enough to cause bad plan choice and thus
> performance issues.
What seems odd to me is that the only difference between the two is the limit
clause:
select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id limit 1;
select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id;
and yet the plan completely changes.
I think that I have to force the evaluation order to get a reliably fast
result:
begin; create temporary table tt on commit drop as
select diag_id from tdiag where create_time >= '2011-06-03 09:49:04.000000+0'
and create_time < '2011-06-06 09:59:04.000000+0';
select * from tdiag where diag_id in (select * from tt)
order by diag_id limit 10; commit;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3566.24..3566.27 rows=10 width=112) (actual
time=1800.699..1800.736 rows=10 loops=1)
-> Sort (cost=3566.24..3566.74 rows=200 width=112) (actual
time=1800.694..1800.708 rows=10 loops=1)
Sort Key: tdiag.diag_id
Sort Method: top-N heapsort Memory: 18kB
-> Nested Loop (cost=1360.00..3561.92 rows=200 width=112) (actual
time=269.087..1608.324 rows=86530 loops=1)
-> HashAggregate (cost=1360.00..1362.00 rows=200 width=4)
(actual time=269.052..416.898 rows=86530 loops=1)
-> Seq Scan on tt (cost=0.00..1156.00 rows=81600
width=4) (actual time=0.020..120.323 rows=86530 loops=1)
-> Index Scan using tdiag_pkey on tdiag (cost=0.00..10.99
rows=1 width=112) (actual time=0.006..0.008 rows=1 loops=86530)
Index Cond: (tdiag.diag_id = tt.diag_id)
Total runtime: 1801.290 ms
>
> And yet another recommendation - the sort is performed on disk, so give it
> more work_mem and it should be much faster (should change from "merge
> sort" to "quick sort"). Try something like work_mem=20MB and see if it
> does the trick.
This certainly speeds up the sorting.
>
> regards
> Tomas
--
Anthony Shipman | What most people think about
Anthony(dot)Shipman(at)symstream(dot)com | most things is mostly wrong.
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2011-06-08 07:39:00 | Re: strange query plan with LIMIT |
Previous Message | Craig Ringer | 2011-06-08 05:07:10 | Re: 100% CPU Utilization when we run queries. |