Joins and DELETE FROM

From: "Kynn Jones" <kynnjo(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Joins and DELETE FROM
Date: 2008-03-08 17:31:30
Message-ID: c2350ba40803080931y363d6fd9oaaca11a0ea4f1b79@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi!

As part of a data warehousing project, I need to pre-process data downloaded
from an external source, in the form of several large flat files. This
preprocessing entails checking the validity of various data items, and
discarding those that fail to pass all the checks.

Currently, the code that performs the checks generates intermediate
temporary tables of those bits of data that are invalid in some way. (This
simplifies the process of generating various quality-control reports about
the incoming data).

The next step is to weed out the bad data from the main tables, and here's
where I begin to get lost.

To be concrete, suppose I have a table T consisting of 20 million rows,
keyed on some column K. (There are no formal constrains on T at the moment,
but one could define column K as T's primary key.) Suppose also that I have
a second table B (for "bad") consisting of 20 thousand rows, and also keyed
on some column K. For each value of B.K there is exactly one row in T such
that T.K = B.K, and the task is to delete all these rows from T as
efficiently as possible.

My naive approach would something like

DELETE FROM T WHERE T.K IN ( SELECT K FROM B );

...which, according to EXPLAIN, is a terrible idea, because it involves
sequentially scanning all 20 million rows of T just to delete about only
0.1% of them.

It seems to me better to sequentially scan B and rely on an index on T to
zero-in the few rows in T that must be deleted.

Is this strategy something that can be done with plain SQL (even if to do
this I must produce additional helper tables, indices, etc.), or must I
write a stored procedure to implement it?

TIA!

Kynn

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Heikki Linnakangas 2008-03-08 18:01:41 Re: Joins and DELETE FROM
Previous Message Alvaro Herrera 2008-03-08 14:17:50 Re: Re: Confirmação de envio / Sending confirmation (captchaid:13266b402f09)