From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | romanf(at)fusemail(dot)com |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Deleting orphaned records to establish Ref Integrity |
Date: | 2005-06-02 06:36:09 |
Message-ID: | 10442.1117694169@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
"Roman F" <romanf(at)fusemail(dot)com> writes:
> ... Executing something like
> the following statement would work, but even with indexes it takes an
> insane amount of time to execute for each of the tables:
> DELETE FROM child_table WHERE parentid NOT IN
> (SELECT parentid FROM parent_table)
Uh, what sort of query plan are you getting for that?
PG 7.4 and up can do a reasonable job with NOT IN if the sub-select is
small enough to fit into an in-memory hash table (of size sort_mem).
I'm betting that your sort_mem setting is not high enough to encourage
the planner to try the hash method. You could try increasing sort_mem
... but given the size of your tables, you might end up with a hash
table large enough to drive the system into swapping, in which case
it'll still be mighty slow.
Another idea is to try an outer join:
SELECT child_table.parentid INTO tmp_table
FROM child_table LEFT JOIN parent_table
ON (child_table.parentid = parent_table.parentid)
WHERE parent_table.parentid IS NULL;
which essentially does a join and then pulls out just the child_table
rows that failed to match. This will probably end up getting done via a
merge join or hybrid hash join, either of which are more scalable than
the NOT IN code. You still have to do the actual deletions in
child_table, but as long as there aren't too many, a NOT IN using
tmp_table should work OK.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dinesh Pandey | 2005-06-02 07:04:58 | index row size 2728 exceeds btree maximum, 2713 |
Previous Message | Joachim Zobel | 2005-06-02 06:33:03 | Limits of SQL |