From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: record identical operator |
Date: | 2013-09-13 21:59:00 |
Message-ID: | 20130913215900.GB7437@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-09-13 14:36:27 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
> >> The new operator is logically similar to IS NOT DISTINCT FROM for a
> >> record, although its implementation is very different. For one
> >> thing, it doesn't replace the operation with column level operators
> >> in the parser. For another thing, it doesn't look up operators for
> >> each type, so the "identical" operator does not need to be
> >> implemented for each type to use it as shown above. It compares
> >> values byte-for-byte, after detoasting. The test for identical
> >> records can avoid the detoasting altogether for any values with
> >> different lengths, and it stops when it finds the first column with
> >> a difference.
> >
> > In the general case, that operator sounds dangerous to me. We don't
> > guarantee that a Datum containing the same data always has the same
> > binary representation. E.g. array can have a null bitmap or may not have
> > one, depending on how they were created.
> >
> > I am not actually sure whether that's a problem for your usecase, but I
> > get headaches when we try circumventing the type abstraction that way.
> >
> > Yes, we do such tricks in other places already, but afaik in all those
> > places errorneously believing two Datums are distinct is not error, just
> > a missed optimization. Allowing a general operator with such a murky
> > definition to creep into something SQL exposed... Hm. Not sure.
>
> Well, the only two alternatives I could see were to allow
> user-visible differences not to be carried to the matview if they
> old and new values were considered "equal", or to implement an
> "identical" operator or function in every type that was to be
> allowed in a matview. Given those options, what's in this patch
> seemed to me to be the least evil.
>
> It might be worth noting that this scheme doesn't have a problem
> with correctness if there are multiple equal values which are not
> identical, as long as any two identical values are equal. If the
> query which generates contents for a matview generates
> non-identical but equal values from one run to the next without any
> particular reason, that might cause performance problems.
I am not actually that concerned with MVCs using this, you're quite
capable of analyzing the dangers. What I am wary of is exposing an
operator that's basically broken from the get go to SQL.
Now, the obvious issue there is that matviews use SQL to refresh :(
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2013-09-13 22:01:56 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Previous Message | Andres Freund | 2013-09-13 21:57:30 | Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers |