Re: record identical operator

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-20 13:49:30
Message-ID: 1379684970.83524.YahooMailNeo@web162903.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Kevin Grittner (kgrittn(at)ymail(dot)com) wrote:
>> ... like
>> just refreshing a view so that the new contents are the same as
>> what you would see if you re-ran the query defining the matview.
>
> I've heard this notion a few times of wanting to make sure that what you
> get from running the query matches the matview.  While that makes sense
> when the equality operator and what-you-see actually match, it doesn't
> when they don't because the what-you-see ends up being non-deterministic
> and can change based on the order the datums are seen in during the
> query processing which can change with different data ordering on disk
> and even due to simply different plans for the same data, I believe.

That's a fair point to some extent.  Where notions of equality
differ, it is not always non-deterministic, but it can be.  For
citext you are correct.  For a sum() of numeric data, the number of
decimal positions will be the largest value seen; the value present
in the query results will not vary by order of rows scanned or by
plan.

The result of this is that with the patch, an incremental refresh
of a matview will always match what the query returned at the time
it was run (there is no *correctness* problem) but if someone uses
a query with non-deterministic results they may have a lot of
activity on a concurrent refresh even if there was no change to the
underlying data -- so you could have a *performance* penalty in
cases where the query returns something different, compared to
leaving the old "equal but not the same" results.

> Consider a GROUP BY with a citext column as one of the key fields.
> You're going to get whatever value the aggregate happened to come across
> first when building the HashAgg.  Having the binary equality operator
> doesn't help that and seems like it could even result in change storms
> happening due to a different plan when the actual data didn't change.

Yup.  A person who wants to GROUP BY a citext value for a matview
might want to fold it to a consistent capitalization to avoid that
issue.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-09-20 13:51:55 Re: Minor inheritance/check bug: Inconsistent behavior
Previous Message Stephen Frost 2013-09-20 13:48:09 Re: record identical operator - Review