Quick Links

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date:	2005-04-25 19:13:18
Message-ID:	200504251213.18565.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Simon, Tom:

While it's not possible to get accurate estimates from a fixed size sample, I
think it would be possible from a small but scalable sample: say, 0.1% of all
data pages on large tables, up to the limit of maintenance_work_mem.

Setting up these samples as a % of data pages, rather than a pure random sort,
makes this more feasable; for example, a 70GB table would only need to sample
about 9000 data pages (or 70MB). Of course, larger samples would lead to
better accuracy, and this could be set through a revised GUC (i.e.,
maximum_sample_size, minimum_sample_size).

I just need a little help doing the math ... please?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-25 18:49:01 from Simon Riggs

Responses

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-25 19:18:26 from Josh Berkus
Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-25 20:43:10 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2005-04-25 19:18:26	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous Message	Simon Riggs	2005-04-25 18:49:01	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2005-04-25 19:18:26	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Previous Message	Simon Riggs	2005-04-25 18:49:01	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?