Re: altering a column's collation leaves an invalid foreign key

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: altering a column's collation leaves an invalid foreign key
Date: 2024-07-16 08:29:00
Message-ID: CACJufxEW6OMBqt8cbr=3Jt++Zd_SL-4YDjfk7Q7DhGKiSLcu4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 18, 2024 at 4:50 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 08.06.24 06:14, jian he wrote:
> > if FK is nondeterministic, then it looks PK more like FK.
> > the following example, one FK row is referenced by two PK rows.
> >
> > DROP TABLE IF EXISTS fktable, pktable;
> > CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
> > CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES
> > pktable on update cascade on delete cascade);
> > INSERT INTO pktable VALUES ('A'), ('Å');
> > INSERT INTO fktable VALUES ('A');
>
> Yes, this is a problem. The RI checks are done with the collation of
> the primary key.
>
> The comment at ri_GenerateQualCollation() says "the SQL standard
> specifies that RI comparisons should use the referenced column's
> collation". But this is not what it says in my current copy.
>
> ... [ digs around ISO archives ] ...
>
> Yes, this was changed in SQL:2016 to require the collation on the PK
> side and the FK side to match at constraint creation time. The argument
> made is exactly the same we have here. This was actually already the
> rule in SQL:1999 but was then relaxed in SQL:2003 and then changed back
> because it was a mistake.
>
> We probably don't need to enforce this for deterministic collations,
> which would preserve some backward compatibility.
>
> I'll think some more about what steps to take to solve this and what to
> do about back branches etc.
>

I have come up with 3 corner cases.

---case1. not ok. PK indeterministic, FK default
DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY);
CREATE TABLE fktable (x text REFERENCES pktable on update cascade on
delete cascade);
INSERT INTO pktable VALUES ('A');
INSERT INTO fktable VALUES ('a');
INSERT INTO fktable VALUES ('A');

RI_FKey_check (Check foreign key existence ) querybuf.data is
SELECT 1 FROM ONLY "public"."pktable" x WHERE "x"
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
in here, fktable doesn't have collation.
we cannot rewrite to
SELECT 1 FROM ONLY "public"."pktable" x WHERE "x"
OPERATOR(pg_catalog.=) $1 collate "C" FOR KEY SHARE OF x
so assumption (only one referenced row for any referencing row) will
break when inserting values to fktable.
RI_FKey_check already allows invalidate values to happen, not sure how
ri_GenerateQualCollation can help.
overall i don't know how to stop invalid value while inserting value
to fktable in this case.

---case2. not ok case PK deterministic, FK indeterministic
DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES
pktable on update cascade on delete cascade);
INSERT INTO pktable VALUES ('A'), ('Å');
INSERT INTO fktable VALUES ('A');
begin; update pktable set x = 'B' where x = 'Å'; table fktable; rollback;
begin; update pktable set x = 'B' where x = 'A'; table fktable; rollback;

when cascade update fktable, in RI_FKey_cascade_upd
we can use pktable's collation. but again, a query updating fktable
only, using pktable collation seems strange.

---case3. ---not ok case PK indeterministic, FK deterministic
DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY);
CREATE TABLE fktable (x text collate "C" REFERENCES pktable on update
cascade on delete cascade);
INSERT INTO pktable VALUES ('A');
INSERT INTO fktable VALUES ('A');
INSERT INTO fktable VALUES ('a');
begin; update pktable set x = 'Å'; table fktable; rollback;

when we "INSERT INTO fktable VALUES ('a');"
we can disallow this invalid query in RI_FKey_check by constructing
the following stop query
SELECT 1 FROM ONLY public.pktable x WHERE x OPERATOR(pg_catalog.=) 'a'
collate "C" FOR KEY SHARE OF x;
but this query associated PK table with fktable collation seems weird?

summary:
case1 seems hard to resolve
case2 can be resolved in ri_GenerateQualCollation, not 100% sure.
case3 can be resolved in RI_FKey_check
because case1 is hard to resolve;
so overall I feel like erroring out PK indeterministic and FK
indeterministic while creating foreign keys is easier.

We can mandate foreign keys and primary key columns be deterministic
in ATAddForeignKeyConstraint.
The attached patch does that.

That means src/test/regress/sql/collate.icu.utf8.sql table test10pk,
table test11pk will have a big change.
so only attach src/backend/commands/tablecmds.c changes.

Attachment Content-Type Size
make_pk_fk_tie_collation_deterministic.diff text/x-patch 1.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2024-07-16 08:53:46 Re: Things I don't like about \du's "Attributes" column
Previous Message Kisoon Kwon 2024-07-16 08:25:50 Re: [ pg_ctl ] Review Request for Adding Pre-check User Script Feature