Re: How to speed up delete where not in

From: Melvin Davidson <melvin6925(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andrus <kobruleht2(at)hot(dot)ee>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to speed up delete where not in
Date: 2015-09-26 13:44:27
Message-ID: CANu8FizEfeqiXb8pb9M1tApAZop0_tqyng6ieqKMUouNs1aYbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In addition to the previous recommendation, make sure you have an index on
dokumnr in table omrid.

EG:
CREATE INDEX omrid_dokumnr_fk ON omrid
USING BTREE (dokumnr);

On Sat, Sep 26, 2015 at 7:33 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 26 September 2015 at 19:53, Andrus <kobruleht2(at)hot(dot)ee> wrote:
>
>> Hi!
>>
>>
>> I'm looking for a way to delete records which do not have child rows on
>> big tables where lot of rows needs to be deleted. Both tables have lot of
>> other foreign key references.
>>
>>
>> Document headers are in omdok table:
>>
>> create table omdok ( dokumnr serial primary key, ... );
>>
>> Document rows are in omrid table
>>
>> CREATE TABLE omrid
>> (
>> id serial NOT NULL,
>> reanr serial NOT NULL,
>> dokumnr integer NOT NULL,
>> CONSTRAINT omrid_pkey PRIMARY KEY (id),
>> CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
>> REFERENCES omdok (dokumnr) MATCH SIMPLE
>> ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
>> IMMEDIATE,
>> ....
>> );
>>
>>
>> I tried
>>
>> delete from omdok where dokumnr not in (select dokumnr from omrid)
>>
>> Query it is running currently 15 hours and is still running.
>> postgres.exe is using 50% CPU all the time (this is 2 core CPU).
>>
>> explain delete from omdok where dokumnr not in (select dokumnr
>> from omrid)
>>
>> returns:
>>
>> "Delete (cost=0.00..21971079433.34 rows=220815 width=6)"
>> " -> Seq Scan on omdok (cost=0.00..21971079433.34 rows=220815
>> width=6)"
>> " Filter: (NOT (SubPlan 1))"
>> " SubPlan 1"
>> " -> Materialize (cost=0.00..94756.92 rows=1897261 width=4)"
>> " -> Seq Scan on omrid (cost=0.00..77858.61
>> rows=1897261 width=4)"
>>
>> - How to delete parents without child rows fast?
>>
>
> if you write the query as with a NOT EXISTS, rather than a NOT IN() it
> should complete much more quickly.
>
> It's important to know that the semantics of NOT IN are likely not at all
> what you think:
>
> For example, in the following query, would you expect it to return the row
> with a.a = 3 ?
>
> select * from (values(1),(2),(3)) a(a) WHERE a NOT IN(SELECT b FROM
> (VALUES(NULL),(1),(2)) b(b));
>
> The presence of NULL causes this to not behave the way you might think,
> yet it works exactly the way the SQL standard dictates.
>
> You could think of this as equivalent to writing:
>
> select * from (values(1),(2),(3)) a(a) WHERE a.a <> NULL AND a.a <> 1 AND
> a.a <> 2;
>
> And since a.a <> NULL is 'unknown', this causes the entire WHERE clause to
> be false, therefore nothing matches.
>
> For this reason PostgreSQL does not optimise NOT IN() the same way as it
> optimises NOT EXISTS().
>
> If you write the query as:
>
> delete from omdok where not exists(select 1 from omrid where omdok.dokumnr
> = omrid.dokumnr);
>
> then you might see it execute in a few seconds. Perhaps you should
> consider cancelling the current query, perhaps perform a VACUUM on omdoc
> after cancelling, and then run the NOT EXISTS version instead.
>
> I mocked up your tables locally, and populated them with the same number
> of records as your row estimates in the EXPLAIN you pasted and I got:
>
> test=# delete from omdok where not exists (select 1 from omrid where
> omdok.dokumn = omrid.dokumnr);
> DELETE 0
> Time: 1698.233 ms
>
> Whereas with the NOT IN() I cancelled it after 10 minutes.
>
> Regards
>
> David Rowley
>
> --
> David Rowley http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Training & Services
>
>

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Nikolai Zhubr 2015-09-26 16:17:39 Server-side hooks for user session start and session end
Previous Message David Rowley 2015-09-26 11:33:18 Re: How to speed up delete where not in