Quick Links

Re: count(*), EXISTS, indexes

From:	Itai Zukerman <zukerman(at)math-hat(dot)com>
To:	josh(at)agliodbs(dot)com
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: count(*), EXISTS, indexes
Date:	2003-04-12 04:01:49
Message-ID:	873ckoqs7m.fsf@matt.w80.math-hat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

>> They are my own operators and functions. profile is a integer array
>> and the ~'s are subset operators. psig is a bit signature, "~" is
>> complement, and the ~ operators again are subset operators.
>
> You're going to have to work on your question-posting skills.

*blush*, sorry.

> This explains why the planner thinks that the exists clause will return
> 255,000 rows instead of the handful it actually does return.

Actually:

# explain select sum(weight) from rprofile rp where exists (select 1 from rcount_prof rcp where rcp.profile ~<= rp.profile and ~rcp.psig ~<= rp.psig and rcp.filter='{734,1944}');
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=1544943.75..1544943.75 rows=1 width=4)
-> Seq Scan on rprofile rp (cost=0.00..1544255.00 rows=275500 width=4)
Filter: (subplan)
SubPlan
-> Seq Scan on rcount_prof rcp (cost=0.00..2.70 rows=1 width=0)
Filter: ((profile ~<= $0) AND ((~ psig) ~<= $1) AND (filter = '{734,1944}'::text))

I sort-of don't know what I'm doing reading these query plans, but I
think the estimates are more-or-less right: few rows from rcount_prof,
many rows from rprofile. 275,500 may be a bit high (it's actually
around 24,000 in this case), but def. within an order of magnitude.

> Or build your own custom index types to take advantage of your custom
> operators. B-tree indexes are optimized for =, LIKE, <, and > queries; I
> don't think they know what to do with "~<="

I created GiST indexes for the ~ operators (but, no, I haven't tuned
the selectivity estimation functions). The problem was, the query
wasn't using the index on rprofile. Stephan Szabo's suggestion worked
beautifully, though:

dl=# EXPLAIN ANALYZE
dl-# SELECT sum(weight)
dl-# FROM (SELECT DISTINCT ON(rp.rid) rp.rid, rp.weight
dl(# FROM rprofile rp, rcount_prof rcp
dl(# WHERE rcp.profile ~<= rp.profile
dl(# AND ~rcp.psig ~<= rp.psig
dl(# AND rcp.filter='{734,1944}') AS FOO;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2245.08..2245.08 rows=1 width=1026) (actual time=954.45..954.45 rows=1 loops=1)
-> Subquery Scan foo (cost=2245.07..2245.08 rows=1 width=1026) (actual time=824.85..935.71 rows=23619 loops=1)
-> Unique (cost=2245.07..2245.08 rows=1 width=1026) (actual time=824.85..898.24 rows=23619 loops=1)
-> Sort (cost=2245.07..2245.08 rows=1 width=1026) (actual time=824.84..840.52 rows=23619 loops=1)
Sort Key: rp.rid
-> Nested Loop (cost=0.00..2245.06 rows=1 width=1026) (actual time=0.28..520.41 rows=23619 loops=1)
Join Filter: ("outer".profile ~<= "inner".profile)
-> Seq Scan on rcount_prof rcp (cost=0.00..2.44 rows=1 width=287) (actual time=0.06..0.33 rows=1 loops=1)
Filter: (filter = '{734,1944}'::text)
-> Index Scan using rprofile_profile_idx on rprofile rp (cost=0.00..2232.98 rows=551 width=739) (actual time=0.14..453.11 rows=23666 loops=1)
Index Cond: ((~ "outer".psig) ~<= rp.psig)
Total runtime: 958.62 msec

The row estimates are way off, though. My fault...

--
Itai Zukerman <http://www.math-hat.com/~zukerman/>

In response to

Re: count(*), EXISTS, indexes at 2003-04-11 22:36:23 from Josh Berkus

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Randolph Jones	2003-04-13 01:50:07	sort up then down
Previous Message	Tom Lane	2003-04-12 00:00:06	Re: count(*), EXISTS, indexes