From: | Matteo Beccati <php(at)beccati(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Enhanced containment selectivity function |
Date: | 2005-08-04 14:36:54 |
Message-ID: | 42F22806.20503@beccati.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Hi,
I've recently had problems with slow queries caused by the selectivity
of the <@ ltree operator, as you may see in my post here:
http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php
Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
restriction selectivity function for <@ is contsel, which returns a
constant value of 0.001. So I started digging in the source code trying
to understand how the default behaviour could be enhanced, and ended up
writing a little patch which adds an alternative containment selectivity
function (called "contstatsel") which is able to deliver better results.
This first version is based on the eqsel function and uses only
histogram values to calculate the selectivity and uses the 0.001
constant as a fallback.
This also made me think: is there a reason why geometric selectivity
functions return constant values rather than checking statistics for a
better result?
Attached you will find a patch suitable for current CVS HEAD. My C
skills are a bit rusty and my knowledge of pg internals are very poor,
so I'm sure it could be improved and modified to better fit the pg
coding standards.
Here are the results on a slow query:
test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..553.02 rows=8 width=364) (actual
time=2.423..19787.259 rows=6785 loops=1)
-> Index Scan using gw_users_gisttree_key on gw_users
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434
rows=4696 loops=1)
Index Cond: (tree <@ '1041'::ltree)
-> Index Scan using gw_batches_t_stamp_u_id_key on gw_batches
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1
loops=4696)
Index Cond: ((gw_batches.t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone) AND ("outer".u_id =
gw_batches.u_id))
Total runtime: 19805.447 ms
(6 rows)
test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=245.26..1151.80 rows=7671 width=364) (actual
time=69.562..176.966 rows=6785 loops=1)
Hash Cond: ("outer".u_id = "inner".u_id)
-> Bitmap Heap Scan on gw_batches (cost=57.74..764.39 rows=8212
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp
with time zone)
-> Bitmap Index Scan on gw_batches_t_stamp_u_id_key
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819
loops=1)
Index Cond: (t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone)
-> Hash (cost=175.79..175.79 rows=4692 width=156) (actual
time=61.046..61.046 rows=4696 loops=1)
-> Seq Scan on gw_users (cost=0.00..175.79 rows=4692
width=156) (actual time=0.083..34.200 rows=4696 loops=1)
Filter: (tree <<@ '1041'::ltree)
Total runtime: 194.621 ms
(10 rows)
The second query uses a custom <<@ operator I added to test the
alternative selectivity function:
CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS
double precision AS 'contstatsel' LANGUAGE internal;
CREATE OPERATOR <<@ (
LEFTARG = ltree,
LEFTARG = ltree,
PROCEDURE = ltree_risparent,
COMMUTATOR = '@>',
RESTRICT = contstatsel,
JOIN = contjoinsel
);
Of course any comments/feedback are welcome.
Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Attachment | Content-Type | Size |
---|---|---|
contstatsel.diff | text/plain | 6.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Woodward | 2005-08-04 14:43:49 | Re: Solving the OID-collision problem |
Previous Message | Mark Woodward | 2005-08-04 14:26:38 | pg_dump -- data and schema only? |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-08-04 14:46:24 | Re: prevent encoding conversion recursive error |
Previous Message | Qingqing Zhou | 2005-08-04 08:53:51 | prevent encoding conversion recursive error |