From: | "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | "David West" <david(dot)west(at)cusppoint(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: limit clause breaks query planner? |
Date: | 2008-09-01 19:17:17 |
Message-ID: | 162867790809011217n587e6090qdec2c4c6a1f580e6@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello
2008/9/1 David West <david(dot)west(at)cusppoint(dot)com>:
> Thanks for your suggestion but the result is the same.
>
> Here is the explain analyse output from different queries.
> Select * from my_table where A is null and B = '21' limit 15
>
> "Limit (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
> " -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
> " Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
> "Total runtime: 85896.214 ms"
>
I see it - problem is in statistics - system expect 1055580, but there
is only 15 values.
try
a) increase statistics on column a and b - probably there are strong
dependency between column a nad b, because statistic are totally out
b) try cursors
declare cursor c as Select * from my_table where A is null and B =
'21' limit 15;
fetch forward 15 from c;
http://www.postgresql.org/docs/8.2/static/sql-fetch.html
maybe
select * from (select * from mytable where b = '21' offset 0) where a
is null limit 15
regards
Pavel Stehule
> As you can see the estimated cost was 3.68: a long way from the true value.
>
> Doing 'set enable_seqscan=false' and repeating the select:
> "Limit (cost=0.00..5.58 rows=15 width=128) (actual time=4426.438..4426.834 rows=15 loops=1)"
> " -> Index Scan using idx_A on my_table this_ (cost=0.00..392956.76 rows=1055970 width=128) (actual time=4426.433..4426.768 rows=15 loops=1)"
> " Index Cond: (A IS NULL)"
> " Filter: ((B)::text = '21'::text)"
> "Total runtime: 4426.910 ms"
>
> Probably some caching made this query faster, but it's still too slow, and using the wrong index.
>
> Deleting index A gives:
> "Limit (cost=0.00..56.47 rows=15 width=128) (actual time=10.298..10.668 rows=15 loops=1)"
> " -> Index Scan using idx_B on my_table this_ (cost=0.00..3982709.15 rows=1057960 width=128) (actual time=10.293..10.618 rows=15 loops=1)"
> " Index Cond: ((B)::text = '21'::text)"
> " Filter: (A IS NULL)"
> "Total runtime: 10.735 ms"
> Much better. However I need index A for another query so I can't just delete it.
>
> Looking at the estimated cost, you can see why it's choosing the order that it is choosing, but it just doesn't seem to reflect reality at all.
>
> Now here's the result of the query, with both indexes in place and sequential scan enabled
> Select * from my_table where A is null and B = '21'
> "Bitmap Heap Scan on my_table this_ (cost=20412.89..199754.37 rows=1060529 width=128) (actual time=470.772..7432.062 rows=1020062 loops=1)"
> " Recheck Cond: ((B)::text = '21'::text)"
> " Filter: (A IS NULL)"
> " -> Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0) (actual time=466.545..466.545 rows=1020084 loops=1)"
> " Index Cond: ((B)::text = '21'::text)"
> "Total runtime: 8940.119 ms"
>
> In this case it goes for the correct index. It appears that the query planner makes very simplistic assumptions when it comes to LIMIT?
>
> Thanks
> David
>
> -----Original Message-----
> From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
> Sent: 01 September 2008 13:53
> To: David West
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] limit clause breaks query planner?
>
> Hello
>
> you should partial index
>
> create index foo(b) on mytable where a is null;
>
> regards
> Pavel Stehule
>
> 2008/9/1 David West <david(dot)west(at)cusppoint(dot)com>:
>> Hi,
>>
>>
>>
>> I have a single table with about 10 million rows, and two indexes. Index A
>> is on a column A with 95% null values. Index B is on a column B with about
>> 10 values, ie. About a million rows of each value.
>>
>>
>>
>> When I do a simple query on the table (no joins) with the following
>> condition:
>>
>> A is null AND
>>
>> B = '21'
>>
>>
>>
>> it uses the correct index, index B. However, when I add a limit clause of
>> 15, postgres decides to do a sequential scan :s. Looking at the results
>> from explain:
>>
>>
>>
>> "Limit (cost=0.00..3.69 rows=15 width=128)"
>>
>> " -> Seq Scan on my_table this_ (cost=0.00..252424.24 rows=1025157
>> width=128)"
>>
>> " Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>
>>
>>
>> It appears that postgres is (very incorrectly) assuming that it will only
>> have to retrieve 15 rows on a sequential scan, and gives a total cost of
>> 3.69. In reality, it has to scan many rows forward until it finds the
>> correct value, yielding very poor performance for my table.
>>
>>
>>
>> If I disable sequential scan (set enable_seqscan=false) it then incorrectly
>> uses the index A that has 95% null values: it seems to incorrectly apply the
>> same logic again that it will only have to retrieve 15 rows with the limit
>> clause, and thinks that the index scan using A is faster than index scan B.
>>
>>
>>
>> Only by deleting the index on A and disabling sequential scan will it use
>> the correct index, which is of course by far the fastest.
>>
>>
>>
>> Is there an assumption in the planner that a limit of 15 will mean that
>> postgres will only have to read 15 rows? If so is this a bad assumption?
>> If a particular query is faster without a limit, then surely it will also be
>> faster with the limit.
>>
>>
>>
>> Any workarounds for this?
>>
>>
>>
>> Thanks
>>
>> David
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2008-09-01 19:41:48 | Re: slow update of index during insert/copy |
Previous Message | Scott Carey | 2008-09-01 18:46:59 | Re: slow update of index during insert/copy |