Re: altering a column's collation leaves an invalid foreign key

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: altering a column's collation leaves an invalid foreign key
Date: 2024-10-25 14:26:48
Message-ID: CACJufxE=f=m21fx5rmr=w7oySw+aXq=JWLKJK10tq5AJoRjCVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 25, 2024 at 2:27 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>

> /*
> * ri_GenerateQualCollation --- add a COLLATE spec to a WHERE clause
> *
> - * At present, we intentionally do not use this function for RI queries that
> - * compare a variable to a $n parameter. Since parameter symbols always have
> - * default collation, the effect will be to use the variable's collation.
> - * Now that is only strictly correct when testing the referenced column, since
> - * the SQL standard specifies that RI comparisons should use the referenced
> - * column's collation. However, so long as all collations have the same
> - * notion of equality (which they do, because texteq reduces to bitwise
> - * equality), there's no visible semantic impact from using the referencing
> - * column's collation when testing it, and this is a good thing to do because
> - * it lets us use a normal index on the referencing column. However, we do
> - * have to use this function when directly comparing the referencing and
> - * referenced columns, if they are of different collations; else the parser
> - * will fail to resolve the collation to use.
> + * We only have to use this function when directly comparing the referencing
> + * and referenced columns, if they are of different collations; else the
> + * parser will fail to resolve the collation to use. We don't need to use
> + * this function for RI queries that compare a variable to a $n parameter.
> + * Since parameter symbols always have default collation, the effect will be
> + * to use the variable's collation.
> + *
> + * Note that we require that the collations of the referencing and the
> + * referenced column have the some notion of equality: Either they have to
> + * both be deterministic or else they both have to be the same.
> */

drop table if exists pktable, fktable;
CREATE TABLE pktable (x text COLLATE "POSIX" PRIMARY KEY);
CREATE TABLE fktable (x text COLLATE "C" REFERENCES pktable on update
cascade on delete cascade);
INSERT INTO pktable VALUES ('A'), ('Å');
INSERT INTO fktable VALUES ('A');

update pktable set x = 'a' collate "C" where x = 'A' collate "POSIX";

the cascade update fktable query string is:
UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x"
ideally it should be
UPDATE ONLY "public"."fktable" SET "x" = $1 collate "C" WHERE $2
collate "POSIX" OPERATOR(pg_catalog.=) "x"

as we already mentioned in several places: PK-FK tie either they have to
both be deterministic or else they both have to be the same collation
oid.
so the reduction to
UPDATE ONLY "public"."fktable" SET "x" = $1 WHERE $2 OPERATOR(pg_catalog.=) "x"
is safe.
but you look at SPI_execute_snapshot, _SPI_convert_params. then we can see
the collation metadata is not keeped.

> + We don't need to use
> + * this function for RI queries that compare a variable to a $n parameter.
> + * Since parameter symbols always have default collation, the effect will be
> + * to use the variable's collation.

so I think a better description is
> + We don't need to use
> + * this function for RI queries that compare a variable to a $n parameter.
> + * Since parameter symbols don't have collation information, the effect will be
> + * to use the variable's collation.

you can see related discovery in
https://www.postgresql.org/message-id/CACJufxEtPBWAk7nEn69ww2LKi9w1i4dLwd5gnjD1DQ2vaYoi2g%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-10-25 14:29:09 Re: Can rs_cindex be < 0 for bitmap heap scans?
Previous Message Shubham Khanna 2024-10-25 14:10:56 Re: Pgoutput not capturing the generated columns