Re: trying to delete most of the table by range of date col

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: trying to delete most of the table by range of date col
Date: 2018-09-03 10:25:04
Message-ID: CA+t6e1=Q-am1hZenTRH3yJKMJZOeF1Z+KG4RPqX2gz4u8vU8Yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

I checked, the results :

1)explain (analyze,buffers) delete from my_table where end_date <=
to_date('12/12/2018','DD/MM/YYYY') and end_date >
to_date('11/12/2018','DD/MM/YYYY');

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Delete on my_table (cost=0.00..97294.80 rows=1571249 width=6) (actual
time=4706.791..4706.791 rows=0 loops=1)
Buffers: shared hit=3242848
-> Seq Scan on my_table (cost=0.00..97294.80 rows=1571249 width=6)
(actual time=0.022..2454.686 rows=1572864 loops=1)
Filter: ((end_date <= to_date('12/12/2018'::text,
'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text,
'DD/MM/YYYY'::text)))
Rows Removed by Filter: 40253
Buffers: shared hit=65020(*8k/1024)=507MB
Planning time: 0.182 ms

2)explain (analyze,buffers) DELETE FROM my_table WHERE id IN (select id
from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY') and
end_date > to_date('11/12/2018','DD/MM/YYYY'));

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on my_table (cost=108908.17..252425.01 rows=1559172 width=12)
(actual time=11168.090..11168.090 rows=0 loops=1)
Buffers: shared hit=3307869 dirtied=13804, temp read=13656 written=13594
-> Hash Join (cost=108908.17..252425.01 rows=1559172 width=12) (actual
time=1672.222..6401.288 rows=1572864 loops=1)
Hash Cond: (my_table_1.id = my_table.id)
Buffers: shared hit=130040, temp read=13656 written=13594
-> Seq Scan on my_table my_table_1 (cost=0.00..97075.26
rows=1559172 width=14) (actual time=0.008..2474.671 rows=1572864 loops=1)
Filter: ((end_date <= to_date('12/12/2018'::text,
'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text,
'DD/MM/YYYY'::text)))
Rows Removed by Filter: 40253
Buffers: shared hit=65020
-> Hash (cost=81047.63..81047.63 rows=1602763 width=14) (actual
time=1671.613..1671.613 rows=1613117 loops=1)
Buckets: 131072 Batches: 32 Memory Usage: 3392kB
Buffers: shared hit=65020, temp written=6852
-> Seq Scan on my_table (cost=0.00..81047.63 rows=1602763
width=14) (actual time=0.003..778.311 rows=1613117 loops=1)
Buffers: shared hit=65020

3)explain (analyze,buffers) DELETE FROM my_table my_table USING id_test
WHERE my_table.id = id_test.id;

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Delete on my_table my_table (cost=109216.05..178743.05 rows=1572960
width=12) (actual time=7307.465..7307.465 rows=0 loops=1)
Buffers: shared hit=3210748, local hit=6960, temp read=13656
written=13594
-> Hash Join (cost=109216.05..178743.05 rows=1572960 width=12) (actual
time=1636.744..4489.246 rows=1572864 loops=1)
Hash Cond: (id_test.id = my_table.id)
Buffers: shared hit=65020, local hit=6960, temp read=13656
written=13594
-> Seq Scan on id_test(cost=0.00..22689.60 rows=1572960 width=14)
(actual time=0.009..642.859 rows=1572864 loops=1)
Buffers: local hit=6960
-> Hash (cost=81160.02..81160.02 rows=1614002 width=14) (actual
time=1636.228..1636.228 rows=1613117 loops=1)
Buckets: 131072 Batches: 32 Memory Usage: 3392kB
Buffers: shared hit=65020, temp written=6852
-> Seq Scan on my_table my_table (cost=0.00..81160.02
rows=1614002 width=14) (actual time=0.297..815.133 rows=1613117 loops=1)
Buffers: shared hit=65020

I restarted the cluster after running every query.

‫בתאריך יום ב׳, 3 בספט׳ 2018 ב-12:23 מאת ‪Justin Pryzby‬‏ <‪
pryzby(at)telsasoft(dot)com‬‏>:‬

> On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote:
> > Hi,
> > I already checked and on all the tables that uses the id col of the main
> > table as a foreign key have index on that column.
> >
> > So, it seems that the second solution is the fastest one. It there a
> reason
> > why the delete chunks (solution 4) wasnt faster?
>
> I suggest running:
>
> SET track_io_timing=on; -- requires superuser
> explain(ANALYZE,BUFFERS) DELETE [...]
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
>
> Maybe you just need larger shared_buffers ?
>
> Justin
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Jammie 2018-09-03 10:33:54 postgresql Logical Replication Stream fails with “Database connection failed when reading from copy”
Previous Message Justin Pryzby 2018-09-03 09:23:01 Re: trying to delete most of the table by range of date col

Browse pgsql-performance by date

  From Date Subject
Next Message Carrie Berlin 2018-09-03 11:09:35 Re: trying to delete most of the table by range of date col
Previous Message Justin Pryzby 2018-09-03 09:23:01 Re: trying to delete most of the table by range of date col