From: | Ray Ontko <rayo(at)ontko(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: Are statistics gathered on function indexes? |
Date: | 2002-06-28 15:10:36 |
Message-ID: | 200206281510.KAA04318@shire.ontko.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Tom, et al,
Hmm. Something is wierd here. Watch this.
1) first we demonstrate that we drop to 1 row between 1 and 2 letters.
2) then we do an analyze and things work the way you suggest.
3) then we do some more stuff
4) things revert to the old way
5) And even after we re-analyze correctly, things still look broken
Eek. Do you have an explanation for this behavior?
It appears that "vacuum analyze verbose actor" causes the problem.
It appears that I have to say "vacuum analyze actor" in order to
clear out the ill effects of having said "vacuum analyze verbose actor".
Typing "vacuum verbose analyze actor" doesn't clear things out,
but it doesn't produce the problem.
Ray
P.S. psql (PostgreSQL) 7.2.1
**********
1) first we demonstrate that we drop to 1 row between 1 and 2 letters.
**********
develop=# select count(*) from actor ;
count
--------
433902
(1 row)
develop=# explain select * from actor where actor_full_name like 'W%' ;
NOTICE: QUERY PLAN:
Seq Scan on actor (cost=0.00..12529.77 rows=3992 width=570)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WI%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=570)
EXPLAIN
**********
2) then we do an analyze and things work the way you suggest.
**********
develop=# vacuum analyze verbose actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.23s/0.07u sec elapsed 0.30 sec.
NOTICE: Analyzing actor
VACUUM
develop=# \h vac
Command: VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]
develop=# vacuum verbose analyze actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.26s/0.05u sec elapsed 0.30 sec.
NOTICE: Analyzing actor
VACUUM
develop=# explain select * from actor where actor_full_name like 'W%' ;
NOTICE: QUERY PLAN:
Seq Scan on actor (cost=0.00..12529.77 rows=7468 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WI%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..447.49 rows=112 width=571
)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WIL%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..9.61 rows=2 width=571)
EXPLAIN
**********
3) then we do some more stuff
**********
develop=# vacuum verbose analyze actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.24s/0.06u sec elapsed 0.30 sec.
NOTICE: Analyzing actor
VACUUM
develop=# vacuum analyze actor verbose ;
ERROR: parser: parse error at or near "verbose"
develop=# vacuum verbose analyze actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.22s/0.08u sec elapsed 0.29 sec.
NOTICE: Analyzing actor
VACUUM
develop=# vacuum analyze verbose actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.24s/0.06u sec elapsed 0.30 sec.
NOTICE: Analyzing actor
VACUUM
**********
4) Then things revert to the old way.
**********
develop=# explain select * from actor where actor_full_name like 'W%' ;
NOTICE: QUERY PLAN:
Seq Scan on actor (cost=0.00..12529.77 rows=6244 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WI%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WIL%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WILL%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
**********
5) And even after we re-analyze correctly, things still look broken
**********
develop=# vacuum verbose analyze actor ;
NOTICE: --Relation actor--
NOTICE: Pages 7106: Changed 0, Empty 0; Tup 433902: Vac 0, Keep 0, UnUsed 1443.
Total CPU 0.30s/0.01u sec elapsed 0.30 sec.
NOTICE: Analyzing actor
VACUUM
develop=# explain select * from actor where actor_full_name like 'W%' ;
NOTICE: QUERY PLAN:
Seq Scan on actor (cost=0.00..12529.77 rows=7130 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WI%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WIL%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
develop=# explain select * from actor where actor_full_name like 'WILL%' ;
NOTICE: QUERY PLAN:
Index Scan using actor_full_name on actor (cost=0.00..6.01 rows=1 width=571)
EXPLAIN
> Ray Ontko <rayo(at)ontko(dot)com> writes:
> >> It's there already; what did you think was making the difference
> >> between W% and WI% ?
>
> > Yes, but the cost doesn't continue to decline if I make the LIKE
> > more and more restrictive by going from WI% to WIL% to WILL%, etc.
>
> Yes it does, if you have a large enough table. In most scenarios
> the selectivity drops off fast enough with larger strings that you
> hit the minimum estimate of 1 row pretty quickly; I suppose that's
> what's happening with your case. Here's an example using the 7.2
> regression-test database:
>
> -- update stats
> regression=# analyze road;
> ANALYZE
>
> -- now force planner to think "road" is much larger than it really is,
> -- else we can't see the change in estimate beyond WI%
> regression=# update pg_class set relpages = relpages * 10000,
> regression-# reltuples = reltuples * 10000 where relname = 'road';
> UPDATE 1
>
> regression=# explain select * from only road where name like 'W%';
> NOTICE: QUERY PLAN:
>
> Seq Scan on road (cost=0.00..1444625.00 rows=764903 width=89)
>
> EXPLAIN
> regression=# explain select * from only road where name like 'WI%';
> NOTICE: QUERY PLAN:
>
> Index Scan using rix on road (cost=0.00..25007.80 rows=8406 width=89)
>
> EXPLAIN
> regression=# explain select * from only road where name like 'WIJ%';
> NOTICE: QUERY PLAN:
>
> Index Scan using rix on road (cost=0.00..277.04 rows=92 width=89)
>
> EXPLAIN
> regression=# explain select * from only road where name like 'WIJK%';
> NOTICE: QUERY PLAN:
>
> Index Scan using rix on road (cost=0.00..5.28 rows=1 width=89)
>
> EXPLAIN
> regression=# explain select * from only road where name like 'WIJKL%';
> NOTICE: QUERY PLAN:
>
> Index Scan using rix on road (cost=0.00..5.23 rows=1 width=89)
>
> EXPLAIN
> regression=# explain select * from only road where name like 'WIJKLM%';
> NOTICE: QUERY PLAN:
>
> Index Scan using rix on road (cost=0.00..5.23 rows=1 width=89)
>
> EXPLAIN
> regression=#
>
> As you can see, the estimate drops off by about a factor of 90 per
> added character. This is probably too much, but it's not that easy
> to determine what the ratio ought to be. The critical code involved
> in this is convert_string_to_scalar in backend/utils/adt/selfuncs.c;
> the ratio per character is essentially the same as the character range
> that it induces from the available values. Feel free to propose a
> better implementation if you can think of one.
>
> regards, tom lane
>
----------------------------------------------------------------------
Ray Ontko rayo(at)ontko(dot)com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Fouad Fezzi | 2002-06-28 15:12:14 | Re: Postmaster environment variables |
Previous Message | Tom Lane | 2002-06-28 14:01:02 | Re: Are statistics gathered on function indexes? |