From: | Markus Schaber <schabi(at)logix-tt(dot)com> |
---|---|
To: | pgsql-perform <pgsql-performance(at)postgresql(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Date: | 2005-05-03 13:06:23 |
Message-ID: | 4277774F.7040205@logix-tt.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hi, Josh,
Josh Berkus wrote:
> Yes, actually. We need 3 different estimation methods:
> 1 for tables where we can sample a large % of pages (say, >= 0.1)
> 1 for tables where we sample a small % of pages but are "easily estimated"
> 1 for tables which are not easily estimated by we can't afford to sample a
> large % of pages.
>
> If we're doing sampling-based estimation, I really don't want people to lose
> sight of the fact that page-based random sampling is much less expensive than
> row-based random sampling. We should really be focusing on methods which
> are page-based.
Would it make sense to have a sample method that scans indices? I think
that, at least for tree based indices (btree, gist), rather good
estimates could be derived.
And the presence of a unique index should lead to 100% distinct values
estimation without any scan at all.
Markus
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2005-05-03 13:38:19 | Re: [pgsql-advocacy] Increased company involvement |
Previous Message | =?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?= | 2005-05-03 13:00:22 | Regression tests |
From | Date | Subject | |
---|---|---|---|
Next Message | Markus Schaber | 2005-05-03 14:40:46 | Re: batch inserts are "slow" |
Previous Message | Chris Browne | 2005-05-02 16:16:42 | Re: batch inserts are "slow" |