From: | Reece Hart <reece(at)harts(dot)net> |
---|---|
To: | SF PostgreSQL <sfpug(at)postgresql(dot)org> |
Subject: | query plans and immutable functions |
Date: | 2006-05-09 17:51:00 |
Message-ID: | 1147197060.27835.54.camel@tallac.gene.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | sfpug |
I've got a simple 1-predicate query that uses an index when the
condition is literal but does a seq scan when the condition uses the
result of an immutable function. This leads to terrible performance in
the latter case. I believe that this is a bug candidate and I'll
forward it to pgsql-bugs if other sfpug folks agree.
Environs:
unison(at)csb-dev=> select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.1.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)
unison(at)csb-dev=> \d pseq
Table "unison.pseq"
Column | Type | Modifiers
---------+--------------------------+--------------------------------------------------------
pseq_id | integer | not null default nextval('pseq_pseq_id_seq'::regclass)
seq | text | not null
len | integer | not null
md5 | character(32) | not null
added | timestamp with time zone | not null default now()
Indexes:
"pseq_pkey" PRIMARY KEY, btree (pseq_id) CLUSTER
"pseq_md5" UNIQUE, btree (md5)
"pseq_seqhash" UNIQUE, btree (seqhash(seq))
"pseq_added" btree (added)
"pseq_len" btree (len)
Triggers:
pseq_iu_trigger BEFORE INSERT OR UPDATE ON pseq FOR EACH ROW EXECUTE PROCEDURE pseq_iu_trigger()
unison(at)csb-dev=> analyze pseq;
ANALYZE
Time: 75.583 ms
Protein sequences are stored in the database. pseq_iu_trigger fills len
and md5. The pseq_md5 index is UNIQUE above, but that was for testing
this problem. The behavior is the same when it is not unique.
clean_sequence(<sequence>) returns a canonical version of the sequence.
It's implemented in C and is marked immutable:
unison(at)csb-dev=> \df+ unison.clean_sequence
List of functions
Schema | Name | Result data type | Argument data types | Owner | Language | Source code | Description
--------+----------------+------------------+---------------------+----------+----------+-------------------+-------------
unison | clean_sequence | text | text | postgres | c | pg_clean_sequence |
unison(at)csb-dev=> select proname,provolatile from pg_proc where proname='clean_sequence';
proname | provolatile
----------------+-------------
clean_sequence | i
I'm using PostgreSQL's md5 (pg_catalog), which is also implemented in C
and marked immutable.
Now the problem: I'd like to look up a sequence by md5. When I do this
using the function in the predicate, I get:
unison(at)csb-dev=> select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
pseq_id
---------
7387455
Time: 7044.530 ms
unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5(clean_sequence('SDVLELTDEN'));
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
Time: 1.493 ms
unison(at)csb-dev=> explain select pseq_id from pseq where md5=md5('SDVLELTDEN');
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on pseq (cost=0.00..441233.47 rows=36772 width=4)
Filter: ((md5)::text = 'f6c80739f2f752132b89535c070192b9'::text)
Note that the function was replaced with its result in the plan.
However, when the literal value is used in the original query, I get a
plan that uses the index and much better performance:
unison(at)csb-dev=> explain select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
QUERY PLAN
---------------------------------------------------------------------
Index Scan using pseq_md5 on pseq (cost=0.00..6.01 rows=1 width=4)
Index Cond: (md5 = 'f6c80739f2f752132b89535c070192b9'::bpchar)
(2 rows)
Time: 1.402 ms
unison(at)csb-dev=> select pseq_id from pseq where md5='f6c80739f2f752132b89535c070192b9';
pseq_id
---------
7387455
(1 row)
Time: 1.424 ms
So, it appears to me that although the planner recognized that the
function call is immutable during the query and may be replaced by its
return value, it didn't use this information when constructing the query
plan.
Feedback?
Thanks,
Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
From | Date | Subject | |
---|---|---|---|
Next Message | Reece Hart | 2006-05-09 18:16:58 | Re: query plans and immutable functions |
Previous Message | Josh Berkus | 2006-05-09 00:23:45 | Carpool tommorrow |