Quick Links

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date:	2018-03-26 19:09:47
Message-ID:	66f652d0-030a-2f53-df85-effb272a5919@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03/26/2018 06:21 PM, Dean Rasheed wrote:
> On 26 March 2018 at 14:08, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 03/26/2018 12:31 PM, Dean Rasheed wrote:
>>> A wider concern I have is that I think this function is trying to be
>>> too clever by only resetting selected stats. IMO it should just reset
>>> all stats unconditionally when the column type changes, which would
>>> be consistent with what we do for regular stats.
>>>
>> The argument a year ago was that it's more plausible that the semantics
>> remains the same. I think the question is how the type change affects
>> precision - had the type change in the opposite direction (int to real)
>> there would be no problem, because both ndistinct and dependencies would
>> produce the same statistics.
>>
>> In my experience people are far more likely to change data types in a
>> way that preserves precision, so I think the current behavior is OK.
>
> Hmm, I don't really buy that argument. Altering a column's type
> allows the data in it to be rewritten in arbitrary ways, and I don't
> think we should presume that the statistics will still be valid just
> because the user *probably* won't do something that changes the data
> much.
>

Maybe, I can only really speak about my experience, and in those cases
it's usually "the column is an INT and I need a FLOAT". But you're right
it's not guaranteed to be like that, perhaps the right thing to do is
resetting the stats.

Another reason to do that might be consistency - resetting just some of
the stats might be surprising for users. And we're are already resetting
per-column stats on that column, so the users running ANALYZE anyway.

BTW in my response I claimed this:

>
> The other reason is that when reducing precision, it generally
> enforces the dependency (you can't violate functional dependencies or
> break grouping by merging values). So you will have stale stats with
> weaker dependencies, but it's still better than not having any.>

That's actually bogus. For example for functional dependencies, it's
important on which side of the dependency we reduce precision. With
(a->b) dependency, reducing precision of "b" does indeed strengthen it,
but reducing precision of "a" does weaken it. So I take that back.

So, I'm not particularly opposed to just resetting extended stats
referencing the altered column.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: [HACKERS] PATCH: multivariate histograms and MCV lists at 2018-03-26 16:21:06 from Dean Rasheed

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2018-03-26 19:17:03	Re: [HACKERS] MERGE SQL Statement for PG11
Previous Message	Andres Freund	2018-03-26 19:01:50	Re: Proposal: http2 wire format