Quick Links

Re: benchmarking the query planner

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Greg Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: benchmarking the query planner
Date:	2008-12-12 14:35:09
Message-ID:	5301.1229092509@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> The existing sampling mechanism is tied to solid statistics.
>>
>> Sounds great, but its not true. The sample size is not linked to data
>> volume, so how can it possibly give a consistent confidence range?

> It is a pretty well-known mathematical fact that for something like an
> opinion poll your margin of error does not depend on the size of the
> population but only on the size of your sample.

Right. The solid math that Greg referred to concerns how big a sample
we need in order to have good confidence in the histogram results.
It doesn't speak to whether we get good results for ndistinct (or for
most-common-values, though in practice that seems to work fairly well).

AFAICS, marginal enlargements in the sample size aren't going to help
much for ndistinct --- you really need to look at most or all of the
table to be guaranteed anything about that.

But having said that, I have wondered whether we should consider
allowing the sample to grow to fill maintenance_work_mem, rather than
making it a predetermined number of rows. One difficulty is that the
random-sampling code assumes it has a predetermined rowcount target;
I haven't looked at whether that'd be easy to change or whether we'd
need a whole new sampling algorithm.

regards, tom lane

In response to

Re: benchmarking the query planner at 2008-12-12 11:44:21 from Robert Haas

Responses

Re: benchmarking the query planner at 2008-12-12 14:50:41 from Greg Stark
Re: benchmarking the query planner at 2008-12-12 15:06:49 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2008-12-12 14:38:54	Re: WIP: default values for function parameters
Previous Message	Heikki Linnakangas	2008-12-12 14:23:28	Re: WIP: default values for function parameters