From: | "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net> |
Subject: | Re: More stable query plans via more predictable column statistics |
Date: | 2016-04-02 18:57:53 |
Message-ID: | CACACo5Qmkt1fdQrGJ_mcLJhJmJ2YpXX-7iB4qen+4sni=YPeRA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Apr 2, 2016 18:38, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de> writes:
> > On Apr 1, 2016 23:14, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Haven't looked at 0002 yet.
>
> > [crosses fingers] hope you'll have a chance to do that before feature
> > freeze for 9.6
>
> I studied this patch for awhile after rebasing it onto yesterday's
> commits.
Fantastic! I could not hope for a better reply :-)
> I did not like the fact that the compute_scalar_stats logic
> would allow absolutely anything into the MCV list once num_hist falls
> below 2. I think it's important that we continue to reject values that
> are only seen once in the sample, because there's no very good reason to
> think that they are MCVs and not just infrequent values that by luck
> appeared in the sample.
In my understanding we only put a value in the track list if we've seen it
at least twice, no?
> However, after I rearranged the tests there so
> that "if (num_hist >= 2)" only controlled whether to apply the 1/K limit,
> one of the regression tests started to fail:
Uh-oh.
> there's a place in
> rowsecurity.sql that expects that if a column contains nothing but several
> instances of a single value, that value will be recorded as a lone MCV.
> Now this isn't a particularly essential thing for that test, but it still
> seems like a good property for ANALYZE to have.
No objection here.
> The reason it's failing,
> of course, is that the test as written cannot possibly accept the last
> (or only) value.
Yeah, this I would expect from such a change.
> Before I noticed the regression failure, I'd been thinking that maybe it'd
> be better if the decision rule were not "at least 100+x% of the average
> frequency of this value and later ones", but "at least 100+x% of the
> average frequency of values after this one".
Hm, sounds pretty similar to what I wanted to achieve, but better
formalized.
> With that formulation, we're
> not constrained as to the range of x. Now, if there are *no* values after
> this one, then this way needs an explicit special case in order not to
> compute 0/0; but the preceding para shows that we need a special case for
> the last value anyway.
>
> So, attached is a patch rewritten along these lines. I used 50% rather
> than 25% as the new cutoff percentage --- obviously it should be higher
> in this formulation than before, but I have no idea if that particular
> number is good or we should use something else. Also, the rule for the
> last value is "at least 1% of the non-null samples". That's a pure guess
> as well.
>
> I do not have any good corpuses of data to try this on. Can folks who
> have been following this thread try it on their data and see how it
> does? Also please try some other multipliers besides 1.5, so we can
> get a feeling for where that cutoff should be placed.
Expect me to run it on my pet db early next week. :-)
Many thanks!
--
Alex
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2016-04-02 20:21:11 | Re: Batch update of indexes |
Previous Message | Tom Lane | 2016-04-02 18:57:21 | Re: Batch update of indexes |