From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org, Dawid Kuroczko <qnex42(at)gmail(dot)com> |
Subject: | Re: When to bump up statistics? |
Date: | 2004-11-19 20:32:59 |
Message-ID: | 200411191232.59600.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Dawid,
> I wonder what are the implications of using this statement,
> I know by using, say n=100, ANALYZE will take more time,
> pg_statistics will be bigger, planner will take longer time,
> on the other hand it will make better decisions... Etc, etc.
Yep. And pg_statistics will need to be vacuumed more often.
> Is bumping up statistics is only useful for indexed columns?
No. It's potentially useful for any queried column.
> 1) huge table with huge number of distinct values (_almost_
> unique ;))
Yes.
> 2) huge table with relatively equally distributed values
> (like each value is in between, say, 30-50 rows).
Not usually.
> 3) huge table with unequally distributed values (some
> values are in 1-5 rows, some are in 1000-5000 rows).
Yes.
> 4) huge table with small number values (around ~100
> distinct values, equally or uneqally distributed).
Not usually, especially if they are equally distributed.
> 5) boolean column.
Almost never, just as it is seldom useful to index a boolean column.
> I think SET STATISTICS 100 is very useful for case with
> unequally distributed values, but I wonder what about
> the other cases. And as a side note -- what are the
> reasonable bounds for statistics (between 10 and 100?)
Oh, no, I've used values up to 500 in production, and we've tested up to the
max on DBT-3. In my experience, if the default (10) isn't sufficient, you
often have to go up to > 250 to get a different plan.
> What are the runtime implications of setting statistics
> too large -- how much can it affect queries?
It won't affect select queries. It will affect ANALYZE time (substantially
in the aggregate) and maintenance on the pg_statistics table.
> And finally -- how other RDBMS and RDBM-likes deal
> with this issue? :)
Most don't allow such fine-tuned adjustment. MSSQL, for example, allows only
setting it per-table or maybe even database-wide, and on that platform it
doesn't seem to have much effect on query plans. Oracle prefers to use
HINTS, which are a brute-force method to manage query plans.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
From | Date | Subject | |
---|---|---|---|
Next Message | Arshavir Grigorian | 2004-11-19 20:39:51 | Re: index use |
Previous Message | Stephan Szabo | 2004-11-19 20:29:42 | Re: index use |