trying to delete most of the table by range of date col

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-performance(at)postgresql(dot)org>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: trying to delete most of the table by range of date col
Date: 2018-09-03 06:27:52
Message-ID: CA+t6e1krNiXeqbXPi7pSYA8vvu7KsRCcE8yexzFBGrnBhS7sZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Hi,
I have a big table (with 1.6 milion records). One of the columns is called
end_date and it`s type is timestamp. I'm trying to find the best way to
delete most of the table but not all of it according to a range of dates.
The table structure :
afa=# \d my_table;
Table "public.my_table"
Column | Type |
Modifiers
---------------------------------+--------------------------+----------------------------------------------------------
id | bigint | not null
default nextval('my_table_id_seq'::regclass)
devid| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| timestamp with time zone |
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| integer | not null
column_name| text | not null
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | not null default 0
column_name| integer | default 0
column_name| integer | default 0
column_name| integer | default 0
column_name| integer | default 0
column_name| integer | default 0
column_name| integer | default 0
end_date | timestamp with time zone |

Indexes:
"my_table_pkey" PRIMARY KEY, btree (id)
"my_table_date_idx" btree (date)
"my_table_device_idx" btree (devid)
"end_date_idx" btree (end_date)
Foreign-key constraints:
"fk_aaaaa" FOREIGN KEY (devid) REFERENCES device_data(id)
Referenced by:
TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN
KEY (my_table_id) REFERENCES my_table(id)
TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey"
FOREIGN KEY (my_table_id) REFERENCES my_table(id)
TABLE "table3" CONSTRAINT "fk_57hmvnx423bw9h203260r8gic" FOREIGN KEY
(my_table) REFERENCES my_table(id)
TABLE "table3" CONSTRAINT "interface_change_my_table_fk" FOREIGN KEY
(my_table) REFERENCES my_table(id)
TABLE "table4" CONSTRAINT "my_table_id_fkey" FOREIGN KEY (my_table_id)
REFERENCES my_table(id) ON DELETE CASCADE
TABLE "table5" CONSTRAINT "my_table_report_my_table_fk" FOREIGN KEY
(my_table_id) REFERENCES my_table(id)
TABLE "table6" CONSTRAINT
"my_table_to_policy_change_my_table_foreign_key" FOREIGN KEY (my_table)
REFERENCES my_table(id)
TABLE "table7" CONSTRAINT "network_object_change_my_table_id_fkey"
FOREIGN KEY (my_table_id) REFERENCES my_table(id)
TABLE "table8" CONSTRAINT "orig_nat_rule_change_my_table_id_fkey"
FOREIGN KEY (my_table_id) REFERENCES my_table(id)
TABLE "table9" CONSTRAINT "risk_change_my_table_id_fkey" FOREIGN KEY
(my_table_id) REFERENCES my_table(id)
TABLE "table10" CONSTRAINT "rule_change_my_table_id_fkey" FOREIGN KEY
(my_table_id) REFERENCES my_table(id)
TABLE "table11" CONSTRAINT "service_change_my_table_id_fkey" FOREIGN
KEY (my_table_id) REFERENCES my_table(id)

As you can see alot of other tables uses the id col as a foreign key which
make the delete much slower.

*Solution I tried for the query : *

delete from my_table where end_date <= to_date('12/12/2018','DD/MM/YYYY')
and end_date > to_date('11/12/2018','DD/MM/YYYY');

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual
time=5121.344..5121.344 rows=0 loops=1)
-> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 width=6)
(actual time=0.012..2244.393 rows=1572864 loops=1)
Filter: ((end_date <= to_date('12/12/2018'::text,
'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text,
'DD/MM/YYYY'::text)))
Rows Removed by Filter: 40253
Planning time: 0.210 ms
Trigger for constraint table1: time=14730.816 calls=1572864
Trigger for constraint table2: time=30718.084 calls=1572864
Trigger for constraint table3: time=28170.363 calls=1572864
Trigger for constraint table4: time=29573.681 calls=1572864
Trigger for constraint table5: time=29629.263 calls=1572864
Trigger for constraint table6: time=29628.489 calls=1572864
Trigger for constraint table7: time=29798.121 calls=1572864
Trigger for constraint table8: time=29645.705 calls=1572864
Trigger for constraint table9: time=29657.177 calls=1572864
Trigger for constraint table10: time=29487.054 calls=1572864
Trigger for constraint table11: time=30010.978 calls=1572864
Trigger for constraint table12: time=26383.924 calls=1572864
Execution time: 350603.047 ms
(18 rows)

-----------------------

DELETE FROM my_table WHERE id IN (select id from my_table where end_date <=
to_date('12/12/2018','DD/MM/YYYY') and end_date >
to_date('11/12/2018','DD/MM/YYYY'));

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on my_table (cost=92522.54..186785.27 rows=1572738 width=12)
(actual time=9367.477..9367.477 rows=0 loops=1)
-> Hash Join (cost=92522.54..186785.27 rows=1572738 width=12) (actual
time=2871.906..5503.732 rows=1572864 loops=1)
Hash Cond: (my_table.id = my_table_1.id)
-> Seq Scan on my_table (cost=0.00..49052.16 rows=1613116
width=14) (actual time=0.004..669.184 rows=1613117 loops=1)
-> Hash (cost=65183.32..65183.32 rows=1572738 width=14) (actual
time=2871.301..2871.301 rows=1572864 loops=1)
Buckets: 131072 Batches: 32 Memory Usage: 3332kB
-> Seq Scan on my_table my_table_1 (cost=0.00..65183.32
rows=1572738 width=14) (actual time=0.009..2115.826 rows=1572864 loops=1)
Filter: ((end_date <= to_date('12/12/2018'::text,
'DD/MM/YYYY'::text)) AND (end_date > to_date('11/12/2018'::text,
'DD/MM/YYYY'::text)))
Rows Removed by Filter: 40253
Planning time: 0.419 ms
Trigger for constraint my_table_id_fkey: time=14291.206 calls=1572864
Trigger for constraint table2_fk: time=29171.591 calls=1572864
Trigger for constraint table3_fk: time=26356.711 calls=1572864
Trigger for constraint table4_fk: time=27579.694 calls=1572864
Trigger for constraint table5_fk: time=27537.491 calls=1572864
Trigger for constraint table6_fk: time=27574.169 calls=1572864
Trigger for constraint table7_fk: time=27716.636 calls=1572864
Trigger for constraint table8_fk: time=27780.192 calls=1572864
....
....

Execution time: 333166.233 ms ~ 5.5 minutes
(23 rows)

Loading into a temp table the data isnt option because I cant truncate the
table because of all the dependencies...

Any idea what else can I check ?

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message pavan95 2018-09-03 06:28:34 Re: Heavy Logging in Subscriber side when configured Logical Replication in 10.4
Previous Message soumitra bhandary 2018-09-03 06:17:37 Re: Not able to increase TPS in Postgres BDR set up using pgbench utility

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-09-03 07:06:24 Re: trying to delete most of the table by range of date col
Previous Message Andreas Kretschmer 2018-08-29 18:48:15 Re: Extremely slow when query uses GIST exclusion index