| From: | Melvin Davidson <melvin6925(at)gmail(dot)com> | 
|---|---|
| To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> | 
| Cc: | Andrus <kobruleht2(at)hot(dot)ee>, pgsql-general <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: How to speed up delete where not in | 
| Date: | 2015-09-26 13:44:27 | 
| Message-ID: | CANu8FizEfeqiXb8pb9M1tApAZop0_tqyng6ieqKMUouNs1aYbA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
In addition to the previous recommendation, make sure you have an index on
dokumnr in table omrid.
EG:
CREATE INDEX omrid_dokumnr_fk ON omrid
  USING BTREE (dokumnr);
On Sat, Sep 26, 2015 at 7:33 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:
> On 26 September 2015 at 19:53, Andrus <kobruleht2(at)hot(dot)ee> wrote:
>
>> Hi!
>>
>>
>> I'm looking for a way to delete records which do not have child rows on
>> big tables where lot of rows needs to be deleted. Both tables have lot of
>> other foreign key references.
>>
>>
>> Document headers are in omdok table:
>>
>>    create table omdok ( dokumnr serial primary key, ... );
>>
>> Document rows are in omrid table
>>
>>    CREATE TABLE omrid
>>    (
>>      id serial NOT NULL,
>>      reanr serial NOT NULL,
>>      dokumnr integer NOT NULL,
>>      CONSTRAINT omrid_pkey PRIMARY KEY (id),
>>      CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
>>          REFERENCES omdok (dokumnr) MATCH SIMPLE
>>          ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
>> IMMEDIATE,
>>      ....
>>    );
>>
>>
>> I tried
>>
>>    delete from omdok where dokumnr not in      (select dokumnr from omrid)
>>
>> Query it is running currently 15 hours and is still running.
>> postgres.exe is using 50% CPU  all the time (this is 2 core CPU).
>>
>>    explain delete from omdok where dokumnr not in      (select dokumnr
>> from omrid)
>>
>> returns:
>>
>>    "Delete  (cost=0.00..21971079433.34 rows=220815 width=6)"
>>    "  ->  Seq Scan on omdok  (cost=0.00..21971079433.34 rows=220815
>> width=6)"
>>    "        Filter: (NOT (SubPlan 1))"
>>    "        SubPlan 1"
>>    "          ->  Materialize  (cost=0.00..94756.92 rows=1897261 width=4)"
>>    "                ->  Seq Scan on omrid  (cost=0.00..77858.61
>> rows=1897261 width=4)"
>>
>> - How to delete parents without child rows fast?
>>
>
> if you write the query as with a NOT EXISTS, rather than a NOT IN() it
> should complete much more quickly.
>
> It's important to know that the semantics of NOT IN are likely not at all
> what you think:
>
> For example, in the following query, would you expect it to return the row
> with a.a = 3 ?
>
> select * from (values(1),(2),(3)) a(a) WHERE a NOT IN(SELECT b FROM
> (VALUES(NULL),(1),(2)) b(b));
>
> The presence of NULL causes this to not behave the way you might think,
> yet it works exactly the way the SQL standard dictates.
>
> You could think of this as equivalent to writing:
>
> select * from (values(1),(2),(3)) a(a) WHERE a.a <> NULL AND a.a <> 1 AND
> a.a <> 2;
>
> And since a.a <> NULL is 'unknown', this causes the entire WHERE clause to
> be false, therefore nothing matches.
>
> For this reason PostgreSQL does not optimise NOT IN() the same way as it
> optimises NOT EXISTS().
>
> If you write the query as:
>
> delete from omdok where not exists(select 1 from omrid where omdok.dokumnr
> = omrid.dokumnr);
>
> then you might see it execute in a few seconds. Perhaps you should
> consider cancelling the current query, perhaps perform a VACUUM on omdoc
> after cancelling, and then run the NOT EXISTS version instead.
>
> I mocked up your tables locally, and populated them with the same number
> of records as your row estimates in the EXPLAIN you pasted and I got:
>
> test=# delete from omdok where not exists (select 1 from omrid where
> omdok.dokumn = omrid.dokumnr);
> DELETE 0
> Time: 1698.233 ms
>
> Whereas with the NOT IN() I cancelled it after 10 minutes.
>
> Regards
>
> David Rowley
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nikolai Zhubr | 2015-09-26 16:17:39 | Server-side hooks for user session start and session end | 
| Previous Message | David Rowley | 2015-09-26 11:33:18 | Re: How to speed up delete where not in |