From: | "Nick Fankhauser" <nickf(at)ontko(dot)com> |
---|---|
To: | "Pgsql-Performance(at)Postgresql(dot) Org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: How to make n_distinct more accurate. |
Date: | 2003-09-24 02:32:24 |
Message-ID: | NEBBLAAHGLEEPCGOBHDGAEKMIJAA.nickf@ontko.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
The performance list seemed to be off-line for a while, so I posed the same
question on the admin list and Tom Lane has been helping in that forum.
-Nick
> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of Nick
> Fankhauser
> Sent: Monday, September 22, 2003 3:42 PM
> To: Pgsql-Performance(at)Postgresql(dot) Org
> Subject: [PERFORM] How to make n_distinct more accurate.
>
>
> Hi-
>
> I have a table- called "event" with a field event_date_time that
> is indexed.
> There are 1,700,000 rows in the table and 92,000 distinct values of
> event_date_time with anywhere from 1 to 400 rows sharing the same
> value. (I
> did a count grouped by event_date_time & scanned it to get this info.)
>
> When I look at the pg_stats on this table, I always see 15,000 or lower in
> the n_distinct column for event_date_time. (I re-ran analyze
> several times &
> then checked pg_stats to see if the numbers varied significantly.)
>
> Since this is off by about a factor of 6, I think the planner is
> missing the
> chance to use this table as the "driver" in a complex query plan that I'm
> trying to optimize.
>
> So the question is- how can I get a better estimate of n_distinct from
> analyze?
>
> If I alter the stats target as high as it will go, I get closer, but it
> still shows the index to be about 1/2 as selective as it actually is:
>
> alpha=# alter table event alter column event_date_time set
> statistics 1000;
> ALTER TABLE
> alpha=# analyze event;
> ANALYZE
> alpha=# select n_distinct from pg_stats where tablename='event' and
> attname='event_date_time';
> n_distinct
> ------------
> 51741
> (1 row)
>
> This number seems to be consistently around 51,000 if I re-run
> analyze a few
> times.
>
> I guess my question is two-part:
>
> (1)Is there any tweak to make this estimate work better?
>
> (2)Since I'm getting numbers that are consistent but way off, is
> there a bug
> here?
>
> (2-1/2) Or alternately, am I totally missing what n-distinct is
> supposed to
> denote?
>
> Thanks!
> -Nick
>
> ---------------------------------------------------------------------
> Nick Fankhauser
>
> nickf(at)doxpop(dot)com Phone 1.765.965.7363 Fax 1.765.962.9788
> doxpop - Court records at your fingertips - http://www.doxpop.com/
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dennis Bjorklund | 2003-09-24 05:20:44 | Re: LIKE query running slow |
Previous Message | Rod Taylor | 2003-09-24 00:53:57 | Re: LIKE query running slow |