Re: BUG #14546: "point" type does not work with "IS DISTINCT"

From: "Casey Witt" <kcwitt(at)gmail(dot)com>
To: <emre(at)hasegeli(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'PostgreSQL Bugs'" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14546: "point" type does not work with "IS DISTINCT"
Date: 2017-02-18 02:37:23
Message-ID: 004901d2898f$f273f520$d75bdf60$@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Emre,

Yeah, changing existing Macros is tough, because it can break backwards compatibility.

I had in mind that fixing this for a point would be easy since the "=" and btree operators are not defined, no existing code bases would be impacted by implementing them, but when I re-read Tom Lanes email I noticed the line that says:
"The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators but they compare areas :-(.

I feel like it is tragic that "=" got implemented for geometric types in PostgreSQL using area, because my gut sense is that equality in PostgreSQL should be based on the raw values stored, and not some arbitrary transformation of them. In this case I am specifically making a distinction between how PostgreSQL should deal with points (and other geometric types) vs. how other things (ie. PostGIS) should deal with them. If PostgreSQL can deal with geometry simply and deterministically (ie. "=" means the raw values are equal, and ordering is based on the cheapest determistic function available), then that would provide a good foundation for everything else (ie. PostGIS) to define whatever makes sense in that environment based on the fundamental definition in PostgreSQL.

I don't really even like the fact that "~=" is defined for points for three reasons:
1) it is fuzzy, but no way to control the error. If you think of "~=" as being the same as "close to", coordinates on in IC chip will want to consider "close to" in the nanometer range, whereas coordinates for a house will consider "close to" in the meter range (if anybody disagrees that coordinates for a house would be in the meter range, that demonstrates the point that the definition of "close to" should not be baked into PostgreSQL because it is use case specific). Although PostgreSQL doesn't innately understand the difference between nanometers and meters, the point is for the IC chip case you may be comparing at the 6th decimal place, whereas for the house case you may want to compare at the first decimal place.
2) there is no expectation that "~=" is defined for every type (unlike "=", which is documented (incorrectly) to be defined for every type)
3) "~=" could be handled by a function which takes two points and an error value (thus allowing the user to use the error value suitable for their use case)

The idea here being that PostgreSQL doesn't even know what it is storing or in what context it is being used (as far as PostgreSQL is concerned, it is just storing bytes that fit the format of some defined type), so it shouldn't be making assumptions about what "close" means. Whereas PostGIS does know what it is storing and in what context it is being used, so PostGIS can make assumptions about what is close (but note that I have never actually used PostGIS, so this is just an assumption).

I am NOT suggesting any change to "~=" because that clearly would break backward compatibility, but I can't help feeling that PostgreSQL is building a pretty big "technical debt" by not sorting this out sooner rather than later.

Casey

-----Original Message-----
From: Emre Hasegeli [mailto:emre(at)hasegeli(dot)com]
Sent: Friday, February 17, 2017 17:21
To: Tom Lane
Cc: kcwitt(at)gmail(dot)com; PostgreSQL Bugs
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

> Nobody's really wanted to break backwards compatibility enough to
> bring some sanity to that mess.

I am trying to bring some sanity to that mess:

https://www.postgresql.org/message-id/flat/CAE2gYzwwxPWbzxY3mtN4WL7W0DCkWo8gnB2ThUHU2XQ9XwgHMg%40mail.gmail.com

Any comment helps to keep the discussion moving.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Linas Vepstas 2017-02-18 03:52:57 Re: BUG #14494: Regression - Null arrays are not queryable
Previous Message Casey Witt 2017-02-18 01:55:07 Re: BUG #14546: "point" type does not work with "IS DISTINCT"