Collation order for btree-indexable datatypes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Collation order for btree-indexable datatypes
Date: 2001-05-02 21:38:03
Message-ID: 16792.988839483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

To avoid getting into states where a btree index is corrupt (or appears
that way), it is absolutely critical that the datatype provide a unique,
consistent sort order. In particular, the operators = <> < <= > >= had
better all agree with each other and with the 3-way-comparison support
function about the ordering of any two non-NULL data values.

After tracing some Assert failures in the new planner statistics code
I'm working on, I have realized that several of our existing datatypes
fail to meet this fundamental requirement, and therefore are prone to
serious misbehavior when trying to index "weird" values. In particular,
type NUMERIC does not return consistent results for comparisons
involving "NaN" values, and several of the date/time types do not return
consistent results for comparisons involving "INVALID" values.
(Example: numeric_cmp will assert that two NaNs are equal, whereas
numeric_eq will assert that they aren't. Worse, numeric_cmp will assert
that a NaN is equal to any non-NaN, too. The date/time routines avoid
the latter mistake but make the former one.)

I am planning to fix this by ensuring that all these operations agree
on an (arbitrarily chosen) sort order for the "weird" values of these
types. What I'm wondering about is whether to insert the fixes into
7.1.1 or wait for 7.2. In theory changing the sort order might break
existing user indexes, and should therefore be avoided until an initdb
is needed. But: any indexes that contain these values are likely broken
already, since in fact we don't have a well-defined sort order right now
for these values.

A closely related problem is that the "current time" special value
supported by several of the date/time datatypes is inherently not
compatible with being indexed, since its sort order relative to
ordinary time values keeps changing. We had discussed removing this
special case, and I think agreed to do so, but it hasn't happened yet.

What I'm inclined to do is force consistency of the comparison operators
now (for 7.1.1) and then remove "current time" for 7.2, but perhaps it'd
be better to leave the whole can of worms alone until 7.2. Comments
anyone?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-05-02 22:32:39 Re: \c connects as another user instead I want in psql
Previous Message Bruce Momjian 2001-05-02 21:36:45 Re: New Linux xfs/reiser file systems