Quick Links

Re: MCV lists for highly skewed distributions

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MCV lists for highly skewed distributions
Date:	2018-01-19 07:42:38
Message-ID:	CANP8+jL5FqBW7ZzRHUChVVBcFRGM84G9Ew-WYOVW=XEVYup53Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 28 December 2017 at 01:45, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> If we stored just a few more values, their inclusion in the MCV would mean
> they are depleted from the residual count, correctly lowering the estimate
> we would get for very rare values not included in the sample.

I agree with this thought.

> So instead of having the threshold of 1.25x the average frequency over all
> values, I think we should use 1.25x the average frequency of only those
> values not already included in the MCV, as in the attached.

It looks like this might even have been the original intention of that code.

Patch looks OK, but I think the comments need minor changes above line 2575

> As it is, you can partially overcome the too short MCV list by cranking up
> the default statistics target, but this is a weak effect and comes at a high
> cost of CPU time. In some of the real distributions I've looked at,
> cranking up default statistics target is almost entirely ineffective.

Agreed, not a good solution.

> [1] Occasionally it will store a much longer MCV list, because no values was
> sampled exactly once, which triggers a different code path in which all seen
> values are put in the MCV and the histogram is NULL. This is not reliable,
> as whether the least-sample value is present in the sample once or twice is
> pretty brittle.

And we need a better discussion of risk: Before we generated too few
MCV entried. To what extent might me now generate too many? Which
would be a problem in increased planning time.

I have a slight reservaton about whether 1.25x is still a sensible
heuristic. Should we add a parameter for that to allow testing during
beta?

Marking as Ready For Committer.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

MCV lists for highly skewed distributions at 2017-12-28 01:45:31 from Jeff Janes

Responses

Re: MCV lists for highly skewed distributions at 2018-01-19 12:33:05 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yuto Hayamizu	2018-01-19 08:07:13	Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Previous Message	Etsuro Fujita	2018-01-19 06:53:12	Re: [HACKERS] postgres_fdw bug in 9.6