From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | josh(at)agliodbs(dot)com |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
Date: | 2005-04-24 04:48:59 |
Message-ID: | 25382.1114318139@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Overall, our formula is inherently conservative of n_distinct. That is, I
> believe that it is actually computing the *smallest* number of distinct
> values which would reasonably produce the given sample, rather than the
> *median* one. This is contrary to the notes in analyze.c, which seem to
> think that we're *overestimating* n_distinct.
Well, the notes are there because the early tests I ran on that formula
did show it overestimating n_distinct more often than not. Greg is
correct that this is inherently a hard problem :-(
I have nothing against adopting a different formula, if you can find
something with a comparable amount of math behind it ... but I fear
it'd only shift the failure cases around.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-04-24 05:52:16 | Re: Wierd performance issue with 8.1cvs |
Previous Message | Paul Tillotson | 2005-04-24 02:00:30 | Re: possible TODO: read-only tables, select from indexes |
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Ristola | 2005-04-24 07:15:03 | Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon |
Previous Message | Thomas F.O'Connell | 2005-04-24 02:33:00 | Re: pgbench Comparison of 7.4.7 to 8.0.2 |