From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: record identical operator |
Date: | 2013-09-14 18:58:32 |
Message-ID: | 20130914185832.GA2291@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > what I am talking about is that
> > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3];
> > obviously should be true.
>
> The patch does not change the behavior of the = operator for any
> type under any circumstances.
Yes, sure. I wasn't thinking you would.
> > But both arrays don't have the same binary representation since
> > the former has a null bitmap, the latter not. So, if you had a
> > composite type like (int4[]) and would compare that without
> > invoking operators you'd return something false in some cases
> > because of the null bitmaps.
>
> Not for the = operator. The new "identical" operator would find
> them to not be identical, though.
Yep. And I think that's a problem if exposed to SQL. People won't
understand the hazards and end up using it because its faster or
somesuch.
> Since the new operator is only for the record type, I need to wrap
> the values in your example:
Yes.
> The REFRESH causes them to match again, and later REFRESH runs
> won't see a need to do any work there unless the on-disk
> representation changes again.
Yes, I understand that the matview code itself will just perform
superflous work. We use such comparisons in other parts of the code
similarly.
> As far as I can see, we have four choices:
>
> (1) Never update values that are "equal", even if they appear
> different to the users, as was demonstrated with the citext
> example.
I think, introducing a noticeable amount of infrastructure for this just
because of citext is a bad idea.
At some point we need to replace citext with proper case-insensitive
collation support - then it really might become necessary.
> (2) Require every data type which can be used in a matview to
> implement some new operator or function for "identical". Perhaps
> that could be mitigated to only implementat it if equal values can
> have user-visible differences.
That basically would require adding a new member to btree opclasses that
btrees don't need themselves... Hm.
> (3) Embed special cases into record identical tests for types
> known to allow multiple on-disk representations which have no
> user-visible differences.
I think this is a complete nogo. a) I don't forsee we know of all these
cases b) it wouldn't be extensible.
Oh. Now that I've read further, I see you feel the same. Good ;)
> (4) Base the need to update a matview column on whether its
> on-disk representation is identical to what a new run of the
> defining query would generate. If this causes performance problems
> for use of a given type in a matview, one possible solution would
> be to modify that particular type to use a canonical format when
> storing a value into a record. For example, storing an array which
> has a bitmap of null values even though there are no nulls in the
> array could strip the bitmap as it is stored to the record.
If matview refreshs weren't using plain SQL and thus wouldn't require
exposing that operator to SQL I wouldn't have a problem with this...
There's the ungodly ugly choice of having an matview_equal function (or
operator) that checks if we're doing a refresh atm...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2013-09-14 19:03:52 | Re: git apply vs patch -p1 |
Previous Message | Marko Tiikkaja | 2013-09-14 18:52:40 | Re: Assertions in PL/PgSQL |