From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, John Naylor <jcnaylor(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: MCV lists for highly skewed distributions |
Date: | 2018-02-01 13:16:05 |
Message-ID: | CANP8+jK5iAVP4c0htfDB6p0519fTT-MCzYnAmKgL2LttUxaCrw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25 January 2018 at 22:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> It occurs to me that maybe a better test to exclude a value from the
>> MCV list would be to demand that its relative standard error not be
>> too high. Such a test, in addition to the existing tests, might be
>> sufficient to solve the opposite problem of too many values in the MCV
>> list, because the real problem there is including a value after having
>> seen relatively few occurrences of it in the sample, and thus having a
>> wildly inaccurate estimate for it. Setting a bound on the relative
>> standard error would mean that we could have a reasonable degree of
>> confidence in estimates produced from the sample.
>
> This patch is marked Ready for Committer, but that seems wildly optimistic
> based on the state of the discussion. It doesn't look to me like we
> even have consensus on an algorithm, let alone code for it. Certainly,
> whatever we do needs to address the too-many-MCVs issue from [1] as
> well as Jeff's too-few-MCVs case.
>
> Even if we had a plausible patch, I'm not sure how we get to the
> point of having enough consensus to commit. In the previous thread,
> it seemed that some people would object to any change whatsoever :-(
>
> In any case, since it looks like the next step is for someone to come
> up with a new proposal, I'm going to set this to Waiting on Author.
Dean and John's results show that different algorithms work better for
different cases.
How about we make ANALYZE's MCV algorithm pluggable? And then include,
say, 2 additional algorithms.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Banck | 2018-02-01 13:24:46 | [PoC PATCH] Parallel dump to /dev/null |
Previous Message | Bruce Momjian | 2018-02-01 13:10:54 | Re: proposal: alternative psql commands quit and exit |