Quick Links

Re: More thoughts about planner's cost estimates

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: More thoughts about planner's cost estimates
Date:	2006-06-02 19:23:34
Message-ID:	200606021223.35168.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg,

> Using a variety of synthetic and real-world data sets, we show that
> distinct sampling gives estimates for distinct values queries that
> are within 0%-10%, whereas previous methods were typically 50%-250% off,
> across the spectrum of data sets and queries studied.

Aha. It's a question of the level of error permissable. For our
estimates, being 100% off is actually OK. That's why I was looking at 5%
block sampling; it stays within the range of +/- 50% n-distinct in 95% of
cases.

> Doing a bit of basic searching around I think the tool we're looking for
> here is called a "chi-squared test for independence".

Augh. I wrote a program (in Pascal) to do this back in 1988. Now I can't
remember the math. For a two-column test it's relatively
computation-light, though, as I recall ... but I don't remember standard
chi square works with a random sample.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

In response to

Re: More thoughts about planner's cost estimates at 2006-06-02 17:08:13 from Greg Stark

Responses

Re: More thoughts about planner's cost estimates at 2006-06-02 20:23:14 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tino Wildenhain	2006-06-02 19:43:04	Re: COPY (query) TO file
Previous Message	Oleg Bartunov	2006-06-02 18:50:08	Re: Connection Broken with Custom Dicts for TSearch2