From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us>, Steve Singer <steve(at)ssinger(dot)info> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: record identical operator - Review |
Date: | 2013-09-27 22:34:20 |
Message-ID: | 1380321260.69363.YahooMailNeo@web162904.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
>> On 09/12/2013 06:27 PM, Kevin Grittner wrote:
>>> Attached is a patch for a bit of infrastructure I believe to be
>>> necessary for correct behavior of REFRESH MATERIALIZED VIEW
>>> CONCURRENTLY as well as incremental maintenance of matviews.
>>
>> Here is attempt at a review of the v1 patch.
>>
>> There has been a heated discussion on list about if we even want
>> this type of operator. I will try to summarize it here
>>
>> The arguments against it are
>> * that a record_image_identical call on two records that returns
>> false can still return true under the equality operator, and the
>> records might (or might not) appear to be the same to users
>> * Once we expose this as an operator via SQL someone will find
>> it and use it and then complain when we change things such as
>> the internal representation of a datatype which might break
>> there queries
I don't see where that is possible unless they count on the order
of records sorted with the new operators, which would not be
something I recommend. Changing the internal representation cannot
break simple use of the alternative equality definition as long as
all you cared about was whether the binary storage format of the
values was identical.
>> * The differences between = and record_image_identical might not
>> be understood by everywhere who finds the operator leading to
>> confusion
Either they find it and read the code, or we document it and they
read the docs.
>> * Using a criteria other than equality (the = operator provided
>> by the datatype) might cause problems if we later provide 'on
>> change' triggers for materialized views
I would fight user-defined triggers for matviews tooth and nail.
We need to get incremental maintenance working instead.
>> The arguments for this patch are
>> * We want the materialized view to return the same value as
>> would be returned if the query were executed directly. This not
>> only means that the values should be the same according to a
>> datatypes = operator but that they should appear the same 'to
>> the eyeball'.
And to functions the user can run against the values. The example
with the null bitmap for arrays being included when not necessary
is something that isn't directly apparent to the eye, but queries
which use pg_column_size would not get equal results.
>> * Supporting the materialized view refresh check via SQL makes a
>> lot of sense but doing that requires exposing something via SQL
>> * If we adequately document what we mean by
>> record_image_identical and the operator we pick for this then
>> users shouldn't be surprised at what they get if they use this
We first need to document the existing record comparison operators.
If they read the docs for comparing "row_constructors" and expect
that to be the behavior they get when they compare records, they
will be surprised.
> This is a good summary. I think there are a few things that make
> this issue difficult to decide:
>
> 1. We have to use an operator to give the RMVC (REFRESH
> MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize
> this query. If we could do this with an SQL or C function, there
> would be less concern about the confusion caused by adding this
> capability.
(a) We can't.
(b) Why would that be less confusing?
> Question: If we are comparing based on some primary key, why do
> we need this to optimize?
Because comparing primary keys doesn't tell us whether the old and
new values in the row all match.
> Certainly the primary key index will be used, no?
It currently uses all columns which are part of unique indexes on
the matview which are not partial (i.e., they do not use a WHERE
clause), they index only on columns (not expressions). It is not
possible to define a primary key on a matview. There are two
reasons for using these columns:
(a) It provides a correctness guarantee against having duplicated
rows which have no nulls. This guarantee is what allows us to use
the differential technique which is faster than retail DELETE and
re-INSERT of everything.
(b) Since we don't support hash joins of records, and it would
tend to be pretty slow if we did, it allows faster hash joins on
one or more columns which stand a good chance of doing this
efficiently.
> 2. Agregates are already non-deterministic,
Only some of them, and only with certain source data.
> so some feel that adding this feature doesn't improve much.
It allows a materialized view which contains a column of a type
without a default btree opclass to *use* concurrent refresh, it
makes queries with deterministic results always generate matview
results with exactly match the results of a VIEW using the query if
it were run at the same moment, and for nondeterministic queries,
it provides results consistent with *some* result set which could
have been returned by a view at the same time. Unless we do
something, none of these things are true. That seems like much to
me.
> The counter-argument is that without the binary row comparison
> ability, rows could be returned that could _never_ have been
> produced by the base data, which is more of a user surprise than
> non-deterministic rows.
Yes, having the matview represent results from from running the
query at *some* point in time is *one* of the benefits.
> 3. Our type casting and operators are already complex, and
> adding another set of operators only compounds that.
It cannot have any effect on any of the existing operators, so I'm
not sure whether you are referring to the extra operators and
functions, or something else. It does not, for example, introduce
any risk of "ambiguous operators".
> 4. There are legitimate cases where tool makers might want the
> ability to compare rows binarily, e.g. for replication, and
> adding these operators would help with that.
>
> I think we need to see a patch from Kevin that shows all the row
> comparisons documented and we can see the impact of the
> user-visibile part.
I'm inclined to first submit a proposed documentation patch for the
existing record operators, and see if we can make everyone happy
with that, and *then* see about adding documentation for the new
ones. Trying to deal with both at once is likely to increase
confusion and wheel-spinning.
> One interesting approach would be to only allow the operator to
> be called from SPI queries.
Why would that be a good idea?
> It would also be good to know about similar non-default entries
> in pg_opclass so we can understand the expected impact.
A quick query (lacking schema information and schema qualification)
shows what is there by default:
select amname, typname, opcname
from pg_opclass c
join pg_am a on a.oid = c.opcmethod
join pg_type t on t.oid = c.opcintype
where not opcdefault
order by 1, 2, 3;
amname | typname | opcname
--------+---------+---------------------
btree | bpchar | bpchar_pattern_ops
btree | inet | cidr_ops
btree | record | record_image_ops
btree | text | text_pattern_ops
btree | text | varchar_ops
btree | text | varchar_pattern_ops
hash | bpchar | bpchar_pattern_ops
hash | inet | cidr_ops
hash | text | text_pattern_ops
hash | text | varchar_ops
hash | text | varchar_pattern_ops
spgist | point | kd_point_ops
(12 rows)
The text_pattern_ops opclass, for example adds memcmp based
comparisons for text strings, that differ from what the default
comparisons regarding greater or lesser values:
test=# select 'a' < E'\u02E5';
?column?
----------
f
(1 row)
test=# select 'a' ~<~ E'\u02E5';
?column?
----------
t
(1 row)
Then there are types with no default btree opclass, so "normal"
equality tests are not defined. When I run a simple query for such
types I get hundreds, although some of those may not be important
for this discussion. Let's pick one, just as an example: box.
test=# select '(2,2),(1,1)'::box = '(12,82),(12.1,92)'::box;
?column?
----------
t
(1 row)
test=# \set A '''(1.9999996,1.9999996),(1,1)''::box'
test=# \set B '''(2,2),(1,1)''::box'
test=# \set C '''(2.0000004,2.0000004),(1,1)''::box'
test=# select :A = :B, :B = :C, :A = :C;
?column? | ?column? | ?column?
----------+----------+----------
t | t | f
(1 row)
The = operator only tells you whether the boxes are approximately
the same size. There is an operator which checks for "same as?"
based on all corners matching, but it still allows EPSILON
differences:
test=# select '(2,2),(1,1)'::box ~= '(12,82),(12.1,92)'::box;
?column?
----------
f
(1 row)
test=# select :A ~= :B, :B ~= :C, :A ~= :C;
?column? | ?column? | ?column?
----------+----------+----------
t | t | t
(1 row)
These infelicities are not too painful if they are not incorporated
into an opclass, however. But lack of a btree opclass for the type
means you get this sort of problem with current "record =":
test=# select (row ('(2,2),(1,1)'::box))::record = (row ('(2,2),(1,1)'::box))::record;
ERROR: could not identify an equality operator for type box
The new operator allows record comparisons when the record contains
such types:
test=# select (row ('(2,2),(1,1)'::box))::record *= (row ('(2,2),(1,1)'::box))::record;
?column?
----------
t
(1 row)
... and would notice if you changed between "equal" values:
test=# select (row (:A))::record *= (row (:B))::record,
test-# (row (:B))::record *= (row (:C))::record,
test-# (row (:A))::record *= (row (:C))::record;
?column? | ?column? | ?column?
----------+----------+----------
f | f | f
(1 row)
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-09-28 00:36:45 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Previous Message | Christopher Browne | 2013-09-27 22:17:56 | Re: Extra functionality to createuser |