Re: Efficiently delete rows not referenced by a foreign key

From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Efficiently delete rows not referenced by a foreign key
Date: 2014-03-03 20:50:27
Message-ID: 1393879827590-5794445.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Evan Martin wrote
> Hi All,
>
> I have a database schema where if row is deleted from one table the rows
> it references in another table should also be deleted, unless still
> referenced by something else.
>
> Eg. Table A has foreign key to table B. When I delete a row from A I
> also want to delete the referenced row in B, unless it's still
> referenced by something else (which may be another row in A or in a
> completely different table C).
>
> The way I currently do this is to have an AFTER DELETE FOR EACH ROW
> trigger on A, which attempts to delete the row in B, but catches and
> ignores a foreign_key_violation exception. This works (the foreign keys
> don't have ON DELETE CASCADE), but it's slow when deleting many rows.
>
> A single query that deletes all the referenced rows in B, if they're not
> referenced by A or C, is much faster. The problem with that approach is
> it's error-prone to write and maintain. There may be many other tables
> that reference B and if they ever change this query has to be updated,
> which is a maintenance nightmare. I could try to auto-generate the SQL
> for it by finding foreign key constraints referencing B in
> information_schema, but that's not a trivial exercise. It also falls
> short in a more complicated scenario where I want to delete rows in
> multiple tables (A1, A2, ...) that may all reference B.
>
> Is there an easier way to do this? Postgres obviously knows about all
> the foreign keys, so is there any way to get it to do the checking for
> me? I mean a way that works in bulk, not one row at a time.
>
> Regards,
>
> Evan

Basically - if there are no foreign references to a given primary key that
key should be deleted?

First question I'd ask is: What harm is there in not deleting the row in
that situation?

If you can use an FK to ensure that such a row cannot be removed if there is
at least one related row then aggressively scanning for an removing rows in
a batch operation - much like vacuum works - seems to be a desirable option
to consider.

I do not believe PostgreSQL has the ability to expose the data you need; or
to perform such a task in bulk. What would be required is some kind of
internal storage of a "reference counter" - but the current FK check
basically runs that same single-row lookup check that you would have to
code.

I'd suggest you do the dynamic, catalog-driven, query. You can add
additional infrastructure if you want to make the routine safer but assuming
you can rely on foreign keys to avoid invalid deletions that is mostly a
luxury. You will know quite quickly if someone introduces a schema change
that brakes your code - ideally in your development and/or staging
environment. Thus the decision is whether to enforce the rule by adding
lots of triggers OR whether to periodically perform a cleanup of the
database based upon those rules. Note you can make use of an "after
statement" trigger instead of "after each row" if that proves to be more
efficient based upon your usage.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Efficiently-delete-rows-not-referenced-by-a-foreign-key-tp5794440p5794445.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message James Harper 2014-03-03 22:09:57 Re: multiple results from a function
Previous Message Merlin Moncure 2014-03-03 20:49:36 Re: multiple results from a function