From: | John A Meinel <john(at)arbash-meinel(dot)com> |
---|---|
To: | josh(at)agliodbs(dot)com |
Cc: | Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>, Markus Schaber <schabi(at)logix-tt(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Date: | 2005-05-04 00:45:17 |
Message-ID: | 42781B1D.7070101@arbash-meinel.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Josh Berkus wrote:
> Mischa,
>
>
>>Okay, although given the track record of page-based sampling for
>>n-distinct, it's a bit like looking for your keys under the streetlight,
>>rather than in the alley where you dropped them :-)
>
>
> Bad analogy, but funny.
>
> The issue with page-based vs. pure random sampling is that to do, for example,
> 10% of rows purely randomly would actually mean loading 50% of pages. With
> 20% of rows, you might as well scan the whole table.
>
> Unless, of course, we use indexes for sampling, which seems like a *really
> good* idea to me ....
>
But doesn't an index only sample one column at a time, whereas with
page-based sampling, you can sample all of the columns at once. And not
all columns would have indexes, though it could be assumed that if a
column doesn't have an index, then it doesn't matter as much for
calculations such as n_distinct.
But if you had 5 indexed rows in your table, then doing it index wise
means you would have to make 5 passes instead of just one.
Though I agree that page-based sampling is important for performance
reasons.
John
=:->
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2005-05-04 00:52:54 | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Previous Message | Tatsuo Ishii | 2005-05-04 00:14:08 | Re: A proper fix for the conversion-function problem |
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2005-05-04 00:52:54 | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Previous Message | Josh Berkus | 2005-05-03 21:43:44 | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |