Quick Links

Re: strange query plan with LIMIT

From:	anthony(dot)shipman(at)symstream(dot)com
To:	pgsql-performance(at)postgresql(dot)org
Cc:	tv(at)fuzzy(dot)cz
Subject:	Re: strange query plan with LIMIT
Date:	2011-06-08 05:08:05
Message-ID:	201106081508.05668.anthony.shipman@symstream.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wednesday 08 June 2011 02:40, tv(at)fuzzy(dot)cz wrote:
> Hi,
>
> why are you reposting this? Pavel Stehule already recommended you to run
> ANALYZE on the tdiag table - have you done that? What was the effect?

The mailing list system hiccupped and I ended up with two posts.

VACUUM ANALYZE was done, more than once.
Setting the statistics value on the diag_id column to 1000 seemed to only make
the query a bit slower.

>
> The stats are off - e.g. the bitmap scan says
>
> -> Bitmap Heap Scan on tdiag (cost=25763.48..638085.13 rows=1141019
> width=114) (actual time=43.232..322.441 rows=86530 loops=1)
>
> so it expects to get 1141019 rows but it gets 86530, i.e. about 7% of the
> expected number. That might be enough to cause bad plan choice and thus
> performance issues.

What seems odd to me is that the only difference between the two is the limit
clause:

select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id limit 1;

select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
diag_id;

and yet the plan completely changes.

I think that I have to force the evaluation order to get a reliably fast
result:

begin; create temporary table tt on commit drop as
select diag_id from tdiag where create_time >= '2011-06-03 09:49:04.000000+0'
and create_time < '2011-06-06 09:59:04.000000+0';
select * from tdiag where diag_id in (select * from tt)
order by diag_id limit 10; commit;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3566.24..3566.27 rows=10 width=112) (actual
time=1800.699..1800.736 rows=10 loops=1)
-> Sort (cost=3566.24..3566.74 rows=200 width=112) (actual
time=1800.694..1800.708 rows=10 loops=1)
Sort Key: tdiag.diag_id
Sort Method: top-N heapsort Memory: 18kB
-> Nested Loop (cost=1360.00..3561.92 rows=200 width=112) (actual
time=269.087..1608.324 rows=86530 loops=1)
-> HashAggregate (cost=1360.00..1362.00 rows=200 width=4)
(actual time=269.052..416.898 rows=86530 loops=1)
-> Seq Scan on tt (cost=0.00..1156.00 rows=81600
width=4) (actual time=0.020..120.323 rows=86530 loops=1)
-> Index Scan using tdiag_pkey on tdiag (cost=0.00..10.99
rows=1 width=112) (actual time=0.006..0.008 rows=1 loops=86530)
Index Cond: (tdiag.diag_id = tt.diag_id)
Total runtime: 1801.290 ms

>
> And yet another recommendation - the sort is performed on disk, so give it
> more work_mem and it should be much faster (should change from "merge
> sort" to "quick sort"). Try something like work_mem=20MB and see if it
> does the trick.

This certainly speeds up the sorting.

>
> regards
> Tomas

--
Anthony Shipman | What most people think about
Anthony(dot)Shipman(at)symstream(dot)com | most things is mostly wrong.

In response to

Re: strange query plan with LIMIT at 2011-06-07 16:40:13 from tv

Responses

Re: strange query plan with LIMIT at 2011-06-08 07:39:00 from Claudio Freire
Re: strange query plan with LIMIT at 2011-06-08 08:33:15 from tv

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2011-06-08 07:39:00	Re: strange query plan with LIMIT
Previous Message	Craig Ringer	2011-06-08 05:07:10	Re: 100% CPU Utilization when we run queries.