Re: Bad query plan inside EXISTS clause

From: Benoit Delbosc <bdelbosc(at)nuxeo(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad query plan inside EXISTS clause
Date: 2010-03-10 15:36:06
Message-ID: 4B97BC66.2030500@nuxeo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yeb Havinga a écrit :
> Yeb Havinga wrote:
>> Kenneth Marshall wrote:
>>> EXISTS matches NULLs too and since they are not indexed a
>>> sequential scan is needed to check for them. Try using
>>> IN instead.
>>>
>> This is nonsense in more than one way.
> Hit ctrl-return a bit too slow - exists does not match null but a set of
> records, that is either empty or not empty. Also it is possible to index
> table columns with nulls, and then the indexes can still be used.
> Besides filtering record sets with expressions, indexes are also used
> for ordering. There the effect of indexes with nulls can be seen: where
> to put them: in front or after the non nulls? So indexes can be
> perfectly used in conjunction with nulls. I found the original mail
> rather intriguing and played with an example myself a bit, but could not
> repeat the behavior (9.0 devel version), in my case the exists used an
> index. Maybe it has something to do with the fact that the planner
> estimates to return 50000 rows, even when the actual numbers list only 1
> hit. In the exists case, it can stop at the first hit. In the select all
> rows case, it must return all rows. Maybe a better plan emerges with
> better statistics?

Thanks for your quick investigation.

Changing the target statistics to 1000 for the column and analyzing the
table has not changed the query plan.

I just notice that using a "LIMIT 1" also gives a bad query plan

EXPLAIN ANALYZE SELECT 1 FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf' LIMIT 1;
QUERY PLAN

---------------------------------------------------------------------------------------
Limit (cost=0.00..1.19 rows=1 width=0) (actual time=366.771..366.772
rows=1 loops=1)
-> Seq Scan on read_acls_cache (cost=0.00..62637.01 rows=52517
width=0) (actual time=366.769..366.769 rows=1 loops=1)
Filter: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
Total runtime: 366.806 ms
(4 rows)

Perhaps the EXISTS clause is also trying to limit the sub select query.

There are no NULL value on this column but only 49 distinct values for
2.5m rows, around 51k rows per value.

What I am trying to do is to know if a value is present or not in the
table, so far the fastest way is to use a COUNT:

EXPLAIN ANALYZE SELECT COUNT(1) FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf';
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=35154.28..35154.29 rows=1 width=0) (actual
time=20.242..20.242 rows=1 loops=1)
-> Bitmap Heap Scan on read_acls_cache (cost=2176.10..35022.98
rows=52517 width=0) (actual time=6.937..15.025 rows=51446 loops=1)
Recheck Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
-> Bitmap Index Scan on read_acls_cache_users_md5_idx
(cost=0.00..2162.97 rows=52517 width=0) (actual time=6.835..6.835
rows=51446 loops=1)
Index Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
Total runtime: 20.295 ms
(6 rows)

ben

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-03-10 15:44:20 Re: Bad query plan inside EXISTS clause
Previous Message Yeb Havinga 2010-03-10 14:05:20 Re: Bad query plan inside EXISTS clause