From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2019-03-25 23:36:25 |
Message-ID: | add2f0d3-4fbc-af7c-e02e-0f2b2d6dedc3@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Attached is an updated patch, fixing all the issues pointed out so far.
Unless there are some objections, I plan to commit the 0001 part by the
end of this CF. Part 0002 is a matter for PG13, as previously agreed.
On 3/24/19 1:17 AM, David Rowley wrote:
> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On 3/21/19 4:05 PM, David Rowley wrote:
>>> 11. In get_mincount_for_mcv_list() it's probably better to have the
>>> numerical literals of 0.0 instead of just 0.
>>
>> Why?
>
> Isn't it what we do for float and double literals?
>
OK, fixed.
>>
>>> 12. I think it would be better if you modified build_attnums_array()
>>> to add an output argument that sets the size of the array. It seems
>>> that most places you call this function you perform bms_num_members()
>>> to determine the array size.
>>
>> Hmmm. I've done this, but I'm not sure I like it very much - there's no
>> protection the value passed in is the right one, so the array might be
>> allocated either too small or too large. I think it might be better to
>> make it work the other way, i.e. pass the value out instead.
>
> When I said "that sets the size", I meant "that gets set to the size",
> sorry for the confusion. I mean, if you're doing bms_num_members()
> inside build_attnums_array() anyway, then this will save you from
> having to do that in the callers too.
>
OK, I've done this now, and I'm fairly happy with it.
>>> 28. Can you explain what this is?
>>>
>>> uint32 type; /* type of MCV list (BASIC) */
>>>
>>> I see: #define STATS_MCV_TYPE_BASIC 1 /* basic MCV list type */
>>>
>>> but it's not really clear to me what else could exist. Maybe the
>>> "type" comment can explain there's only one type for now, but more
>>> might exist in the future?
>>>
>>
>> It's the same idea as for dependencies/mvdistinct stats, i.e.
>> essentially a version number for the data structure so that we can
>> perhaps introduce some improved version of the data structure in the future.
>>
>> But now that I think about it, it seems a bit pointless. We would only
>> do that in a major version anyway, and we don't keep statistics during
>> upgrades. So we could just as well introduce the version/flag/... if
>> needed. We can't do this for regular persistent data, but for stats it
>> does not matter.
>>
>> So I propose we just remove this thingy from both the existing stats and
>> this patch.
>
> I see. I wasn't aware that existed for the other types. It certainly
> gives some wiggle room if some mistakes were discovered after the
> release, but thinking about it, we could probably just change the
> "magic" number and add new code in that branch only to ignore the old
> magic number, perhaps with a WARNING to analyze the table again. The
> magic field seems sufficiently early in the struct that we could do
> that. In the master branch we'd just error if the magic number didn't
> match, since we wouldn't have to deal with stats generated by the
> previous version's bug.
>
OK. I've decided to keep the field for now, for sake of consistency with
the already existing statistic types. I think we can rethink that in the
future, if needed.
>>> 29. Looking at the tests I see you're testing that you get bad
>>> estimates without extended stats. That does not really seem like
>>> something that should be done in tests that are meant for extended
>>> statistics.
>>>
>>
>> True, it might be a bit unnecessary. Initially the tests were meant to
>> show old/new estimates for development purposes, but it might not be
>> appropriate for regression tests. I don't think it's a big issue, it's
>> not like it'd slow down the tests significantly. Opinions?
>
> My thoughts were that if someone did something to improve non-MV
> stats, then is it right for these tests to fail? What should the
> developer do in the case? update the expected result? remove the test?
> It's not so clear.
>
I think such changes would affect a number of other places in regression
tests (changing plans, ...), so I don't see why fixing these tests would
be any different.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-multivariate-MCV-lists-20190325.patch.gz | application/gzip | 39.9 KB |
0002-multivariate-histograms-20190325.patch.gz | application/gzip | 48.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-03-25 23:39:55 | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Previous Message | David Rowley | 2019-03-25 23:29:49 | Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath |