From: | marcin mank <marcin(dot)mank(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | kouber(at)saparev(dot)com, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: LIMIT confuses the planner |
Date: | 2009-03-23 00:12:14 |
Message-ID: | b1b9fac60903221712n255525a6w8575ccb4a75d56bb@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> So the bottom line here is just that the estimated n_distinct is too
> low. We've seen before that the equation we use tends to do that more
> often than not. I doubt that consistently erring on the high side would
> be better though :-(. Estimating n_distinct from a limited sample of
> the population is known to be a statistically hard problem, so we'll
> probably not ever have perfect answers, but doing better is on the
> to-do list.
>
I hit an interestinhg paper on n_distinct calculation:
http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf
the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.
This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).
What do You think?
Greetings
Marcin
From | Date | Subject | |
---|---|---|---|
Next Message | marcin mank | 2009-03-23 00:56:50 | Re: LIMIT confuses the planner |
Previous Message | Laurent Wandrebeck | 2009-03-22 23:14:52 | Re: "iowait" bug? |