From: | Scott Cain <cain(at)cshl(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | stan letovsky <SLetovsky(at)aol(dot)com>, pgsql-performance(at)postgresql(dot)org, gmod schema <gmod-schema(at)lists(dot)sourceforge(dot)net> |
Subject: | Re: [Gmod-schema] Re: performace problem after VACUUM |
Date: | 2003-02-15 20:36:31 |
Message-ID: | 1045341390.3944.678.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Tom,
Here's the short answer: I've got it working much faster now (>100 msec
for the query by explain analyze).
Here's the long answer: I reworked the table, horribly denormalizing
it. I changed the coordinate system, so that start is always less than
end, regardless of strand. Here is the original query:
select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 1 and
((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR
(fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
f.feature_id = fl.feature_id
and here is the equivalent query in the new coordinate system:
select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 1 and
f.feature_id = fl.feature_id and
fl.max >= 390956 and
fl.min <= 393164
Notice that it is MUCH simpler, and the query planner uses exactly the
indexes I want, and as noted above, runs much faster. Of course, this
also means that I have to rewrite my database adaptor, but it shouldn't
be too bad.
For those on the GMOD list, here is how I changed the table:
alter table featureloc add column min int;
alter table featureloc add column max int;
update featureloc set min=nbeg where strand=1;
update featureloc set max=nend where strand=1;
update featureloc set max=nbeg where strand=-1;
update featureloc set min=nend where strand=-1;
update featureloc set min=nbeg where (strand=0 or strand is null) and nbeg<nend;
update featureloc set max=nend where (strand=0 or strand is null) and nbeg<nend;
update featureloc set min=nend where (strand=0 or strand is null) and nbeg>nend;
update featureloc set max=nbeg where (strand=0 or strand is null) and nbeg>nend;
create index featureloc_src_min_max on featureloc (srcfeature_id,min,max);
select count(*) from featureloc where min is null and nbeg is not null;
The last select is just a test to make sure I didn't miss anything, and
it did return zero. Also, it doesn't appear that there are any features
that are strandless. I found that a little surprising, but included
those updates for completeness.
Tom, thank you much for your help. Hopefully, I will get the group to
buy into this schema change, and life will be good.
Scott
On Fri, 2003-02-14 at 19:11, Tom Lane wrote:
> SLetovsky(at)aol(dot)com writes:
> > Am I correct in interpreting your comments as saying you believe that
> > if we could lose the OR and the strand constraint PG would probably
> > use the index properly?
>
> No, I said I thought it could do it without that ;-). But yes, you'd
> have a much less fragile query if you could lose the OR condition.
>
> Have you looked into using a UNION ALL instead of OR to merge the two
> sets of results? It sounds grotty, but might be faster...
>
> regards, tom lane
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain(at)cshl(dot)org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
From | Date | Subject | |
---|---|---|---|
Next Message | Mariusz Czułada | 2003-02-15 23:48:13 | Views with unions |
Previous Message | Christopher Kings-Lynne | 2003-02-15 14:15:01 | Re: Offering tuned config files |