Re: Overhauling GUCS

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Greg Smith <gsmith(at)gregsmith(dot)com>, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Decibel! <decibel(at)decibel(dot)org>
Subject: Re: Overhauling GUCS
Date: 2008-06-09 17:45:00
Message-ID: 484D6C1C.5070004@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

>>> Actually, the reason it's still 10 is that the effort expended to get it
>>> changed has been *ZERO*. I keep asking for someone to make some
>>> measurements, do some benchmarking, anything to make a plausible case
>>> for a specific higher value as being a reasonable place to set it.
>>> The silence has been deafening.
>> Not surprising really. It is a simple adjustment to make and it also is
>> easy to spot when its a problem. However it is not trivial to test for
>> (in terms of time and effort). I know 10 is wrong and so do you. If you
>> don't I am curious why I see so many posts from you saying, "Your
>> estimates are off, what is your default_statistics_target?" with yet
>> even more responses saying, "Uhh 10."

I tried (back in 7.4) to do some systematic testing of this. The
problem is that the cases were higher d_s_t are required are
specifically ones with complex, unbalanced data distributions and/or
very large databases. This makes test cases extremely difficult and
time-consuming to generate; further, I found that the test cases I had
from my clients' databases were not portable (in addition to being
confidential).

Also, I'd actually assert that "10" seems to be perfectly adequate for
the majority of users. That is, the number of users where I've
recommended increasing d_s_t for the whole database is smaller than the
number where I don't, and of course we never hear from most users at
all. So I'm pretty happy recommending "Leave the default. If you
encounter problem queries, increase it to 100, and analyse the database.
If you're running a data warehouse, increase it to 1000."

Where analyze does systematically fall down is with databases over 500GB
in size, but that's not a function of d_s_t but rather of our tiny
sample size.

--Josh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Boley 2008-06-09 17:51:07 Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Previous Message Pavel Stehule 2008-06-09 17:33:22 Re: proposal: new contrib module - session variables