Re: record identical operator

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: record identical operator
Date: 2013-09-24 01:21:53
Message-ID: 20130924012153.GN2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Kevin Grittner (kgrittn(at)ymail(dot)com) wrote:
> I'm trying to figure out what situation you think we're in.
> Seriously, if you could apply the patch and show one example that
> demonstrates what you see to be a problem, that would be great.

Here's an example to illustrate what I'm talking about when it comes
down to "you can't claim that you'll produce exactly what the query
will always, with these types:"

=# table citext_table;
id | name
----+-------
1 | one
3 | three
5 |
4 | Three
2 | Three
(5 rows)

=# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name;
SELECT 3

=# table citext_matview;
name
-------

one
three
(3 rows)

=# set enable_seqscan = false;
SET

=# select name from citext_table where id > 0 group by name;
name
-------
one
Three

(3 rows)

Yes, the 'set enable_seqscan = false;' is a bit of a cheat- but I hope
you agree that it *shouldn't* change the *result* of a query. There's
certainly other cases where a different plan could be picked and result
in the same kind of difference between the matview and running the
query.

Build a matview w/ max(id) as id and a unique index on 'id' and you'll
still get the same issue. The problem is that the seqscan will pick up
on 'three' first while the index scan will find 'Three' first.

This is all with the patch from
http://www.postgresql.org/message-id/1379024847.48294.YahooMailNeo@web162904.mail.bf1.yahoo.com
applied.

I simply don't believe it's possible to be consistent in this unless we
declare the type's equality to be insufficient for us and, everywhere
that we call a type's equality operator, *also* check which of the equal
values involved in the comparison is "bigger" or "lower" (based on
binary comparison) and actually pick one, consistently, all the time.

This is why I'm complaining that we're trying to paper over a difference
that just isn't that simple. I understand why you're trying to- it's
definitely annoying, but this isn't so much a solution as it is a hack
that doesn't actually work in all cases.

I don't have a silver bullet for this but I don't like saying "let's
implement these binary operators and make things *look* consistent in
most cases, and then fail subtly in complex situations." I agree that
users may complain if the underlying relation ends up not having *any*
entries with the value that's in the matview and a refresh doesn't
update it.

I expect users will *also* complain if we implement these operators and
then they write their own queries using this awesome new 'binary'
comparison operator to compare their equal-to-citext strings- and
discover that the binary operators say things that *look* the same are
actually *different* (thinking encoding issues, overlong encodings,
etc). That'd be a subtle and painful bug (technically in their code,
not ours, but still) to find, and then what do you do? Argueing against
citext (as I've had to do in the past..) would probably be more
difficult for some if they saw these operators and figured they could
use them.

As I mentioned in the thread w/ Robert, when looking at further
optimizations down the road, we're going to be in a situation of looking
at only a subset of the records and then considering what to do when
supporting MAX() efficiently leads us to running 'greater(existing,new)'
which returns false, but q and r are binary different. This argument
is saying "always replace what's there" but every other, existing, value
in the table might be represented by the 'existing' representation.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-24 03:08:10 Re: record identical operator
Previous Message Peter Geoghegan 2013-09-23 23:16:59 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE