From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: record identical operator |
Date: | 2013-09-20 15:01:20 |
Message-ID: | 1379689280.4270.YahooMailNeo@web162906.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Kevin Grittner (kgrittn(at)ymail(dot)com) wrote:
>> The result of this is that with the patch, an incremental refresh
>> of a matview will always match what the query returned at the time
>> it was run (there is no *correctness* problem) but if someone uses
>> a query with non-deterministic results they may have a lot of
>> activity on a concurrent refresh even if there was no change to the
>> underlying data -- so you could have a *performance* penalty in
>> cases where the query returns something different, compared to
>> leaving the old "equal but not the same" results.
>
> You mean 'at the time of the incremental refresh', right? Yet that may
> or may not match running that query directly by an end-user as the plan
> that a user might get for the entire query could be different than what
> is run for an incremental update, or due to statistics updates, etc.
I'm confused. The refresh *does* run the query. Sure, if the
query is run again it could return different results. I'm missing
the point here.
>>> Consider a GROUP BY with a citext column as one of the key
>>> fields. You're going to get whatever value the aggregate
>>> happened to come across first when building the HashAgg.
>>> Having the binary equality operator doesn't help that and seems
>>> like it could even result in change storms happening due to a
>>> different plan when the actual data didn't change.
>>
>> Yup. A person who wants to GROUP BY a citext value for a matview
>> might want to fold it to a consistent capitalization to avoid that
>> issue.
>
> I'm trying to figure out why that's a perfectly acceptable solution for
> users running views with GROUP BYs, but apparently it isn't sufficient
> for mat views? In other words, you would suggest telling users "sorry,
> you can't rely on the value returned by a GROUP BY on that citext column
> using a normal view, but we're going to try and do better for mat
> views".
Again, I'm lost. If they don't do something to make the result
deterministic, it could be different on each run of the VIEW, and
on each REFRESH of the matview. I don't see why that is an
argument for trying to suppress the effort needed make the matview
match the latest run of the query.
> I don't intend the above to imply that we should never update values in
> mat views when we can do so in a reliable way to ensure that the value
> matches what a view would return. This matches our notion of UPDATE,
> where we will still UPDATE a value even if the old value and the new
> value are equal according to the type's equality operator, when the
> conditional for the UPDATE is using a reliable type (eg: integer).
Well, we provide a trigger function to suppress the UPDATE
operation if the old and new values are identical -- in terms of
what is stored. We do not attempt to use the default btree equals
operator to suppress updates to different values in some
circumstances.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2013-09-20 15:05:06 | Re: record identical operator |
Previous Message | Andres Freund | 2013-09-20 14:55:06 | Re: record identical operator |