From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: max_wal_senders must die |
Date: | 2010-10-20 20:33:41 |
Message-ID: | AANLkTimDe72_SBN6Mq0nSpV7pGEeaa4o3HxEh3htx07R@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 20, 2010 at 1:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
>> On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>> Actually, I think the best thing for default_statistics_target might
>>> be to scale the target based on the number of rows in the table, e.g.
>>> given N rows:
>>
>> The number of buckets needed isn't related to the population size --
>> it's related to how wide the ranges you'll be estimating selectivity
>> for are.
>
> As the table grows, the present week's data becomes a
> smaller and smaller fraction of the table data.
That's an interesting point. I wonder if we can expose this in some
way that allows users to specify the statistics target in something
more meaningful for them that doesn't change as the ranges in the
table grow. Or even gather stats on the size of the ranges being
queried.
> If you have a WHERE clause of the form WHERE x = some_constant, then
> you get a much better estimate if some_constant is an MCV. If the
> constant is not an MCV, however, you still get better estimates,
> because the estimation code knows that no non-MCV can occur more
> frequently than any MCV, so increasing the number of MCVs pushes those
> estimates closer to reality. It is especially bad when the frequency
> "falls off a cliff" at a certain point in the distribution e.g. if
> there are 243 values that occur much more frequently than any others,
> a stats target of 250 will do much better than 225.
It sounds like what we really need here some way to characterize the
distribution of frequencies. Instead of just computing an upper bound
we should have a kind of histogram showing how many values occur
precisely once, how many occur twice, three times, etc. Or perhaps we
only need to know the most common frequency per bucket. Or, hm...
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-20 20:36:39 | Re: [PATCH] pgcrypto: Test for NULL before dereferencing pointer |
Previous Message | Tom Lane | 2010-10-20 20:31:16 | Re: Creation of temporary tables on read-only standby servers |