From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Rod Taylor <pg(at)rbt(dot)ca>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Gurmeet Manku <manku(at)cs(dot)stanford(dot)edu> |
Subject: | Re: [PERFORM] Bad n_distinct estimation; hacks suggested? |
Date: | 2005-04-27 05:59:30 |
Message-ID: | 87is2869q5.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Rod Taylor <pg(at)rbt(dot)ca> writes:
> > If when we have partitions, that'll be good enough. If partitions aren't
> > available this would be quite painful to anyone with large tables --
> > much as the days of old used to be painful for ANALYZE.
>
> Yeah ... I am very un-enthused about these suggestions to make ANALYZE
> go back to doing a full scan ...
Well one option would be to sample only a small number of records, but add the
data found from those records to the existing statistics. This would make
sense for a steady-state situation, but make it hard to recover from a drastic
change in data distribution. I think in the case of n_distinct it would also
bias the results towards underestimating n_distinct but perhaps that could be
corrected for.
But I'm unclear for what situation this is a concern.
For most use cases users have to run vacuum occasionally. In those cases
"vacuum analyze" would be no worse than a straight normal vacuum. Note that
this algorithm doesn't require storing more data because of the large scan or
performing large sorts per column. It's purely O(n) time and O(1) space.
On the other hand, if you have tables you aren't vacuuming that means you
perform zero updates or deletes. In which case some sort of incremental
statistics updating would be a good solution. A better solution even than
sampling.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2005-04-27 06:27:02 | Re: bitmapscan test, no success, bs is not faster |
Previous Message | Mischa Sandberg | 2005-04-27 05:38:04 | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2005-04-27 07:45:10 | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |
Previous Message | Mischa Sandberg | 2005-04-27 05:38:04 | Re: [HACKERS] Bad n_distinct estimation; hacks suggested? |