Re: Allow foreign keys to reference a superset of unique columns

From: James Coleman <jtc331(at)gmail(dot)com>
To: Wolfgang Walther <walther(at)technowledgy(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kaiting Chen <ktchen14(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, hellopfm(at)gmail(dot)com
Subject: Re: Allow foreign keys to reference a superset of unique columns
Date: 2022-09-25 01:34:15
Message-ID: CAAaqYe_-rrCq-VOsvrOuTEVayubezbKEMsJTRUOBbqkCpmJ53Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 2, 2022 at 5:42 AM Wolfgang Walther <walther(at)technowledgy(dot)de> wrote:
>
> Kaiting Chen:
> > I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> > key constraint referencing a superset of uniquely constrained columns.
>
> +1
>
> Tom Lane:
> > TBH, I think this is a fundamentally bad idea and should be rejected
> > outright. It fuzzes the semantics of the FK relationship, and I'm
> > not convinced that there are legitimate use-cases. Your example
> > schema could easily be dismissed as bad design that should be done
> > some other way.
>
> I had to add quite a few unique constraints on a superset of already
> uniquely constrained columns in the past, just to be able to support FKs
> to those columns. I think those cases most often come up when dealing
> with slightly denormalized schemas, e.g. for efficiency.
>
> One other use-case I had recently, was along the followling lines, in
> abstract terms:
>
> CREATE TABLE classes (class INT PRIMARY KEY, ...);
>
> CREATE TABLE instances (
> instance INT PRIMARY KEY,
> class INT REFERENCES classes,
> ...
> );
>
> Think about classes and instances as in OOP. So the table classes
> contains some definitions for different types of object and the table
> instances realizes them into concrete objects.
>
> Now, assume you have some property of a class than is best modeled as a
> table like this:
>
> CREATE TABLE classes_prop (
> property INT PRIMARY KEY,
> class INT REFERNECES classes,
> ...
> );
>
> Now, assume you need to store data for each of those classes_prop rows
> for each instance. You'd do the following:
>
> CREATE TABLE instances_prop (
> instance INT REFERENCES instances,
> property INT REFERENCES classes_prop,
> ...
> );
>
> However, this does not ensure that the instance and the property you're
> referencing in instances_prop are actually from the same class, so you
> add a class column:
>
> CREATE TABLE instances_prop (
> instance INT,
> class INT,
> property INT,
> FOREIGN KEY (instance, class) REFERENCES instances,
> FOREIGN KEY (property, class) REFERENCES classes_prop,
> ...
> );
>
> But this won't work, without creating some UNIQUE constraints on those
> supersets of the PK column first.

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

> > For one example of where the semantics get fuzzy, it's not
> > very clear how the extra-baggage columns ought to participate in
> > CASCADE updates. Currently, if we have
> > CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> > then an update that changes only foo.b doesn't need to update
> > referencing tables, and I think we even have optimizations that
> > assume that if no unique-key columns are touched then RI checks
> > need not be made. But if you did
> > CREATE TABLE bar (x integer, y integer,
> > FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> > then perhaps you expect bar.y to be updated ... or maybe you don't?
>
> In all use-cases I had so far, I would expect bar.y to be updated, too.
>
> I think it would not even be possible to NOT update bar.y, because the
> FK would then not match anymore. foo.a is the PK, so the value in bar.x
> already forces bar.y to be the same as foo.b at all times.
>
> bar.y is a little bit like a generated value in that sense, it should
> always match foo.b. I think it would be great, if we could actually go a
> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> exists, I would like to set bar.y automatically to the new foo.b.
> Otherwise those kind of updates always have to either query foo before,
> or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values. You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint. If I'm understanding the example correctly,
that seems like a *very* bad idea.

James Coleman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-09-25 02:31:37 Re: WIP: WAL prefetch (another approach)
Previous Message James Coleman 2022-09-25 00:35:52 Re: Consider parallel for lateral subqueries with limit