Quick Links

Re: Deleting orphaned records to establish Ref Integrity

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	romanf(at)fusemail(dot)com
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Deleting orphaned records to establish Ref Integrity
Date:	2005-06-02 06:36:09
Message-ID:	10442.1117694169@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

"Roman F" <romanf(at)fusemail(dot)com> writes:
> ... Executing something like
> the following statement would work, but even with indexes it takes an
> insane amount of time to execute for each of the tables:

> DELETE FROM child_table WHERE parentid NOT IN
> (SELECT parentid FROM parent_table)

Uh, what sort of query plan are you getting for that?

PG 7.4 and up can do a reasonable job with NOT IN if the sub-select is
small enough to fit into an in-memory hash table (of size sort_mem).
I'm betting that your sort_mem setting is not high enough to encourage
the planner to try the hash method. You could try increasing sort_mem
... but given the size of your tables, you might end up with a hash
table large enough to drive the system into swapping, in which case
it'll still be mighty slow.

Another idea is to try an outer join:

SELECT child_table.parentid INTO tmp_table
FROM child_table LEFT JOIN parent_table
ON (child_table.parentid = parent_table.parentid)
WHERE parent_table.parentid IS NULL;

which essentially does a join and then pulls out just the child_table
rows that failed to match. This will probably end up getting done via a
merge join or hybrid hash join, either of which are more scalable than
the NOT IN code. You still have to do the actual deletions in
child_table, but as long as there aren't too many, a NOT IN using
tmp_table should work OK.

regards, tom lane

In response to

Deleting orphaned records to establish Ref Integrity at 2005-06-02 01:28:38 from Roman F

Responses

Re: Deleting orphaned records to establish Ref Integrity at 2005-06-02 18:02:10 from Greg Stark

Browse pgsql-general by date

	From	Date	Subject
Next Message	Dinesh Pandey	2005-06-02 07:04:58	index row size 2728 exceeds btree maximum, 2713
Previous Message	Joachim Zobel	2005-06-02 06:33:03	Limits of SQL