Re: Deleting orphaned records (not exists is very slow)

From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting orphaned records (not exists is very slow)
Date: 2010-09-27 23:23:42
Message-ID: FAA3A014-09A0-40B6-BB9F-BD4F171F96CB@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Sep 27, 2010, at 8:02 , Tim Uckun wrote:

> What is the best strategy for deleting orphaned records from a large table.
>
> The usual NOT IN is very slow so there must be a better way in
> postgres for dealing with these.

If the table is large, I sometimes use the following pattern:

1. Create a trigger on the referenced table (foo) to delete rows from the
referencing table (bar) when they're deleted from the referenced table.
This is a poor man's ON DELETE CASCADE and prevents any more rows from being
orphaned.

2. Create a table with the keys of the referencing table which are no longer
in the referenced table:

CREATE TABLE orphaned_bar
SELECT keycol
FROM bar
LEFT JOIN foo USING (keycol)
WHERE foo.keycol IS NULL;

keycol may be multiple columns if you've got a multi-column key.

3. You're then free to delete the rows from bar however you wish, using orphaned_bar.
You might want to do them in one go, or in batches. You'll likely want to create an
index on orphaned_bar.keycol.

You can then add your foreign key and get rid of the trigger on foo when you're done.

Hope this helps.

Michael Glaesemann
grzm seespotcode net

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Satoshi Nagayasu 2010-09-27 23:29:36 Re: ECPG - Some errno definitions don't match to the manual
Previous Message David Boreham 2010-09-27 23:13:52 Re: zero_damaged_pages doesn't work