Quick Links

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From:	Markus Schaber <schabi(at)logix-tt(dot)com>
To:	pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date:	2005-05-03 13:06:23
Message-ID:	4277774F.7040205@logix-tt.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hi, Josh,

Josh Berkus wrote:

> Yes, actually. We need 3 different estimation methods:
> 1 for tables where we can sample a large % of pages (say, >= 0.1)
> 1 for tables where we sample a small % of pages but are "easily estimated"
> 1 for tables which are not easily estimated by we can't afford to sample a
> large % of pages.
>
> If we're doing sampling-based estimation, I really don't want people to lose
> sight of the fact that page-based random sampling is much less expensive than
> row-based random sampling. We should really be focusing on methods which
> are page-based.

Would it make sense to have a sample method that scans indices? I think
that, at least for tree based indices (btree, gist), rather good
estimates could be derived.

And the presence of a unique index should lead to 100% distinct values
estimation without any scan at all.

Markus

In response to

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-27 15:25:16 from Josh Berkus

Responses

Re: [PERFORM] Bad n_distinct estimation; hacks suggested? at 2005-05-03 21:33:10 from Mischa Sandberg

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2005-05-03 13:38:19	Re: [pgsql-advocacy] Increased company involvement
Previous Message	=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=	2005-05-03 13:00:22	Regression tests

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Markus Schaber	2005-05-03 14:40:46	Re: batch inserts are "slow"
Previous Message	Chris Browne	2005-05-02 16:16:42	Re: batch inserts are "slow"