Re: SoC Ideas for people looking for projects

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SoC Ideas for people looking for projects
Date: 2007-03-21 17:01:03
Message-ID: 60odmmjzu8.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

me(at)benjaminarai(dot)com (Benjamin Arai) writes:
> If you are looking for a SoC idea, I have listed a couple below. I
> am not sure how good of an idea they are but I have ran into the
> following limitations and probably other people have as well in the
> past.

Actually, I have a thought on a SoC idea...

The general notion would be to try to come up with some more rational
information on setting the default column statistics width.

http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET
http://www.postgresql.org/docs/8.2/interactive/planner-stats.html

Now, the default value has long been 10. There are cases where people
find they need to set it higher; that has always been pretty
trial-and-error.

My suspicion is that:

a) The default should probably be a bit higher than 10

b) Some analysis of stats and schema on an individual table could
perhaps provide more specific values for specific columns.

- Data type might provide guidance; there's little need for >3 values on
a binary column, for instance.

- If there is a NOT NULL UNIQUE constraint on a column, that might
suggest > 10 values

- If the column is known to have 150 unique values, that might
suggest SET STATISTICS 150

It might be worth looking at the *least* frequently occuring
values, and set stats high enough to make it likely that at least
one such value would be pulled in...

- Some kinds of values (dates, floats) are sorta continuous in value;
having 10 bins may be pretty OK for such

There are probably some other heuristics to be had; this is just some
ideas off the top of my head.

Nobody has gone through any sort of real analysis of this; there
likely is merit to doing so...
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2007-03-21 17:05:24 Re: to_tsvector in 8.2.3
Previous Message Alan Hodgson 2007-03-21 16:54:41 Re: invalid byte sequence for encoding "UTF8"

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-03-21 17:27:49 Re: CREATE INDEX and HOT - revised design
Previous Message Pavan Deolasee 2007-03-21 16:59:56 Re: CREATE INDEX and HOT - revised design