From: | "Dave Held" <dave(dot)held(at)arraysg(dot)com> |
---|---|
To: | "pgsql-perform" <pgsql-performance(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
Date: | 2005-04-26 22:43:14 |
Message-ID: | 49E94D0CFCD4DB43AFBA928DDD20C8F9026184DE@asg002.asg.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
> -----Original Message-----
> From: Gurmeet Manku [mailto:manku(at)CS(dot)Stanford(dot)EDU]
> Sent: Tuesday, April 26, 2005 5:01 PM
> To: Simon Riggs
> Cc: Tom Lane; josh(at)agliodbs(dot)com; Greg Stark; Marko Ristola;
> pgsql-perform; pgsql-hackers(at)postgresql(dot)org; Utkarsh Srivastava;
> snt(at)iastate(dot)edu
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
>
> [...]
> 2. In a single scan, it is possible to estimate n_distinct by using
> a very simple algorithm:
>
> "Distinct sampling for highly-accurate answers to distinct value
> queries and event reports" by Gibbons, VLDB 2001.
>
> http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
>
> [...]
This paper looks the most promising, and isn't too different
from what I suggested about collecting stats over the whole table
continuously. What Gibbons does is give a hard upper bound on
the sample size by using a logarithmic technique for storing
sample information. His technique appears to offer very good
error bounds and confidence intervals as shown by tests on
synthetic and real data. I think it deserves a hard look from
people hacking the estimator.
__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East, Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-04-26 23:03:07 | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Previous Message | David Wheeler | 2005-04-26 22:06:31 | Re: DO INSTEAD and conditional rules |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-04-26 23:03:07 | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Previous Message | Matthew Nuzum | 2005-04-26 22:32:54 | Re: speed up query with max() and odd estimates |