Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE

From: ShengQiang Shu <sshu(at)fruitfly(dot)org>
To: Scott Cain <cain(at)cshl(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, gmod schema <gmod-schema(at)lists(dot)sourceforge(dot)net>
Subject: Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE
Date: 2003-02-14 21:04:05
Message-ID: 3E4D59C5.304@fruitfly.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


PG really does not do the right thing as I mentioned earlier for joining
tables. To force not to use seqscan, it still does not use right index
(srcfeature_id, nbeg, nend) and performance is even worse.

c_gadfly3=# \d fl_src_b_e_key;
Index "public.fl_src_b_e_key"
Column | Type
---------------+---------
srcfeature_id | integer
nbeg | integer
nend | integer
btree, for table "public.featureloc"

c_gadfly3=# explain analyze select * from featureloc fl, feature f where
f.feature_id = fl.feature_id and srcfeature_id=1 and (nbeg >= 1000 and
nend <= 2000);

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=115700.27..232516.85 rows=69007 width=445) (actual
time=12342.97..12461.23 rows=32 loops=1)
Merge Cond: ("outer".feature_id = "inner".feature_id)
-> Index Scan using feature_pkey on feature f
(cost=0.00..110535.53 rows=2060653 width=361) (actual time=17.85..490.86
rows=28341 loops=1)
-> Sort (cost=115700.27..115872.79 rows=69007 width=84) (actual
time=11944.23..11944.25 rows=32 loops=1)
Sort Key: fl.feature_id
-> Seq Scan on featureloc fl (cost=0.00..107580.43
rows=69007 width=84) (actual time=375.85..11944.10 rows=32 loops=1)
Filter: ((srcfeature_id = 1) AND (nbeg >= 1000) AND
(nend <= 2000))
Total runtime: 12461.37 msec
(8 rows)

c_gadfly3=#
c_gadfly3=# set enable_seqscan=0;
SET
c_gadfly3=# explain analyze select * from featureloc fl, feature f where
f.feature_id = fl.feature_id and srcfeature_id=1 and (nbeg >= 1000 and
nend <= 2000);

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..236345.49 rows=69007 width=445) (actual
time=721.75..26078.64 rows=32 loops=1)
Merge Cond: ("outer".feature_id = "inner".feature_id)
-> Index Scan using fl_feature_id_key on featureloc fl
(cost=0.00..119701.43 rows=69007 width=84) (actual time=549.14..25854.12
rows=32 loops=1)
Filter: ((srcfeature_id = 1) AND (nbeg >= 1000) AND (nend <=
2000))
-> Index Scan using feature_pkey on feature f
(cost=0.00..110535.53 rows=2060653 width=361) (actual time=50.95..200.37
rows=28342 loops=1)
Total runtime: 26078.80 msec
(6 rows)

Scott Cain wrote:

> An update: I ran alter table as suggested, ie,
>
> alter table featureloc alter srcfeature_id set statistics 100;
>
> on each column in the table, running vacuum analyze and explain analyze
> on the query in between each alter to see if it made any difference. It
> did not. Postgres still instists on doing a seq scan on featureloc:
>
> Unique (cost=336831.46..337179.45 rows=2320 width=47) (actual
> time=27219.62..27220.30 rows=179 loops=1)
> -> Sort (cost=336831.46..336831.46 rows=23200 width=47) (actual
> time=27219.61..27219.80 rows=186 loops=1)
> -> Nested Loop (cost=0.00..334732.77 rows=23200 width=47)
> (actual time=1003.04..27217.99 rows=186 loops=1)
> -> Seq Scan on featureloc fl (cost=0.00..261709.31
> rows=23200 width=14) (actual time=814.68..26094.18 rows=186 loops=1)
> -> Index Scan using feature_pkey on feature f
> (cost=0.00..3.14 rows=1 width=33) (actual time=6.03..6.03 rows=1
> loops=186)
> Total runtime: 27220.63 msec
>
>
> On Fri, 2003-02-14 at 12:29, Scott Cain wrote:
>
> >Tom,
> >
> >Sorry about that: I'll try to briefly give the information you are
> >looking for. I've read the docs on ALTER TABLE, but it is not clear to
> >me what columns I should change STATISTICS on, or should I just do it on
> >all of the columns for which indexes exist?
> >
> >Here's the query again:
> >
> >select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
> > from feature f, featureloc fl
> > where
> > fl.srcfeature_id = 1 and
> > ((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR
> > (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
> > f.feature_id = fl.feature_id
> >
> >--------------------------------------------------------------------------
> >
> >Naive database:
> >
> >Unique (cost=75513.46..75513.48 rows=1 width=167) (actual
> >time=22815.25..22815.93 rows=179 loops=1)
> > -> Sort (cost=75513.46..75513.46 rows=1 width=167) (actual
> >time=22815.24..22815.43 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..75513.45 rows=1 width=167) (actual
> >time=2471.25..22814.01 rows=186 loops=1)
> > -> Index Scan using featureloc_idx2 on featureloc fl
> >(cost=0.00..75508.43 rows=1 width=14) (actual time=2463.83..22796.50
> >rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..5.01 rows=1 width=153) (actual time=0.08..0.08 rows=1
> >loops=186)
> >Total runtime: 22816.63 msec
> >--------------------------------------------------------------------------
> >
> >Naive database after featureloc_idx2 dropped:
> >
> >Unique (cost=75545.46..75545.48 rows=1 width=167) (actual
> >time=5232.36..5234.51 rows=179 loops=1)
> > -> Sort (cost=75545.46..75545.46 rows=1 width=167) (actual
> >time=5232.35..5232.54 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..75545.45 rows=1 width=167) (actual
> >time=291.46..5220.69 rows=186 loops=1)
> > -> Index Scan using featureloc_src_strand_beg_end on
> >featureloc fl (cost=0.00..75540.43 rows=1 width=14) (actual
> >time=291.30..5214.46 rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..5.01 rows=1 width=153) (actual time=0.02..0.03 rows=1
> >loops=186)
> >Total runtime: 5234.89 msec
> >--------------------------------------------------------------------------
> >
> >Database after VACUUM ANALYZE was run:
> >
> >Unique (cost=344377.70..344759.85 rows=2548 width=47) (actual
> >time=26466.82..26467.51 rows=179 loops=1)
> > -> Sort (cost=344377.70..344377.70 rows=25477 width=47) (actual
> >time=26466.82..26467.01 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..342053.97 rows=25477 width=47)
> >(actual time=262.66..26465.63 rows=186 loops=1)
> > -> Seq Scan on featureloc fl (cost=0.00..261709.31
> >rows=25477 width=14) (actual time=118.62..26006.05 rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=2.45..2.46 rows=1
> >loops=186)
> >Total runtime: 26467.85 msec
> >--------------------------------------------------------------------------
> >
> >After disallowing seqscans (set enable_seqscan=0):
> >
> >Unique (cost=356513.46..356895.61 rows=2548 width=47) (actual
> >time=27494.62..27495.34 rows=179 loops=1)
> > -> Sort (cost=356513.46..356513.46 rows=25477 width=47) (actual
> >time=27494.61..27494.83 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..354189.73 rows=25477 width=47)
> >(actual time=198.88..27493.48 rows=186 loops=1)
> > -> Index Scan using featureloc_idx1 on featureloc fl
> >(cost=0.00..273845.08 rows=25477 width=14) (actual time=129.30..27280.95
> >rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=1.13..1.13 rows=1
> >loops=186)
> >Total runtime: 27495.66 msec
> >--------------------------------------------------------------------------
> >
> >After dropping featureloc_idx1:
> >
> >Unique (cost=1310195.21..1310577.36 rows=2548 width=47) (actual
> >time=21692.69..21693.37 rows=179 loops=1)
> > -> Sort (cost=1310195.21..1310195.21 rows=25477 width=47) (actual
> >time=21692.69..21692.88 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..1307871.48 rows=25477 width=47)
> >(actual time=2197.65..21691.39 rows=186 loops=1)
> > -> Index Scan using featureloc_idx2 on featureloc fl
> >(cost=0.00..1227526.82 rows=25477 width=14) (actual
> >time=2197.49..21618.89 rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=0.37..0.38 rows=1
> >loops=186)
> >Total runtime: 21693.72 msec
> >--------------------------------------------------------------------------
> >
> >After dropping featureloc_idx2:
> >
> >Unique (cost=1414516.98..1414899.13 rows=2548 width=47) (actual
> >time=1669.17..1669.86 rows=179 loops=1)
> > -> Sort (cost=1414516.98..1414516.98 rows=25477 width=47) (actual
> >time=1669.17..1669.36 rows=186 loops=1)
> > -> Nested Loop (cost=0.00..1412193.25 rows=25477 width=47)
> >(actual time=122.69..1668.08 rows=186 loops=1)
> > -> Index Scan using featureloc_src_strand_beg_end on
> >featureloc fl (cost=0.00..1331848.60 rows=25477 width=14) (actual
> >time=122.51..1661.81 rows=186 loops=1)
> > -> Index Scan using feature_pkey on feature f
> >(cost=0.00..3.14 rows=1 width=33) (actual time=0.02..0.03 rows=1
> >loops=186)
> >Total runtime: 1670.20 msec
> >
> >
> >On Fri, 2003-02-14 at 12:00, Tom Lane wrote:
> >
> >>Scott Cain writes:
> >>
> >>>[ much stuff ]
> >>
> >>Could we see EXPLAIN ANALYZE, not just EXPLAIN, output for all these
> >>alternatives? Your question boils down to "why is the planner
> >>misestimating these queries" ... which is a difficult question to
> >>answer when given only the estimates and not the reality.
> >>
> >>A suggestion though is that you might need to raise the statistics
> >>target on the indexed columns, so that ANALYZE will collect
> >>finer-grained statistics. (See ALTER TABLE ... SET STATISTICS.)
> >>Try booting it up to 100 (from the default 10), re-analyze, and
> >>then see if/how the plans change.
> >>
> >> regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tilo Schwarz 2003-02-14 21:55:51 Re: Changing the default configuration (was Re: [pgsql-advocacy]
Previous Message Scott Cain 2003-02-14 19:22:51 Re: [Gmod-schema] Re: performace problem after VACUUM