Re: BUG #17158: Distinct ROW fails with Postgres 14

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, sait(dot)nisanci(at)microsoft(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17158: Distinct ROW fails with Postgres 14
Date: 2021-08-24 22:16:11
Message-ID: 496046.1629843371@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> writes:
> On 24.08.21 11:55, David Rowley wrote:
>> If it's going to be a problem detecting the lack of hashability during
>> planning then maybe we can just add a hash opclass for BIT to fix this
>> particular case.

> The following types have btree opclasses but not hash opclasses:
> money
> bit
> bit varying
> tsvector
> tsquery
> Also among contrib:
> cube
> ltree
> seg
> We could fix the first three relatively easily (although money is used
> in test cases as not having a hash opclass). Not sure what to do about
> the rest.

We can *not* institute a policy that all types must have hash opclasses,
which is what David's suggestion amounts to.

I've been thinking some more about my upthread suggestion that we just
revert cache_record_field_properties to the way it was, and I think that
it's actually pretty defensible, i.e. the lack of prior complaints isn't
all that astonishing. If a query plan involves making comparisons
(either equality or more general ordering comparisons) on a given RECORD
column, it's pretty likely that that traces directly to a semantic
requirement of the query. So the user won't/shouldn't be surprised if
he gets a failure about a component type not being able to perform the
comparison. The fact that we issue the error at run time not plan
time is a little ugly, but it'd be the same error if we had full
knowledge at plan time. On the other hand, hashing is an implementation
choice, not a semantic requirement, so users can reasonably expect the
planner to avoid using hashing when it won't work.

This argument falls down in a situation where duplicate-elimination
could be done with either hashing or sorting and the datatype has
hashing but not ordering support. I'd argue, however, that the set of
such datatypes is darn near empty. In any case, such failures are not
regressions because they never worked before either.

Undoing that would lose v14's ability to select hashed duplicate
elimination for RECORD columns, but that's still not a regression
because we didn't have it before. Moreover, anyone who's unhappy can
work around the problem by explicitly casting the column to some
suitable named composite type. We can leave it for later to make the
planner smarter about anonymous record types. It clearly could be
smarter, at least for the case of an explicit ROW construct at top
level; but now is no time to be writing such code for v14.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Li EF Zhang 2021-08-25 03:48:26 RE: BUG #17157: authorizaiton of dict_int and bloom extention
Previous Message Peter Eisentraut 2021-08-24 21:25:09 Re: BUG #17158: Distinct ROW fails with Postgres 14