Quick Links

Re: Massive delete performance

From:	"Andy" <frum(at)ar-sd(dot)net>
To:	"Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Massive delete performance
Date:	2005-10-11 13:27:48
Message-ID:	01e401c5ce67$93d65de0$0b00a8c0@forge
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

We run the DB on a linux system. The client has a windows system. The
application is almost the same (so the database structure is 80% the same).
The difference is that the client does not need all the tables.

So, in the remaining tables there are a lot of extra data's that don't
belong to this client. We have to send every night a updated "info" to the
client database. Our (have to admin) "fast and not the best" solution was so
replicated the needed tables, and delete from these the info that is not
needed.

So, I send to this client a "dump" from the database.

I also find the ideea "not the best", but couldn't find in two days another
fast solution. And it works this way for 4 months.

Out database is not THAT big (500MB), the replication about (300MB)...
everything works fast enough except this delete....

How can I evidence the cascade deletes also on explain analyze?

The answer for Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>:

EXPLAIN ANALYZE
DELETE FROM report WHERE id_order IN
(SELECT o.id FROM orders o WHERE o.id_ag NOT IN (SELECT cp.id_ag FROM users
u INNER JOIN
contactpartner cp ON cp.id_user=u.id WHERE u.name in ('dc') ORDER BY
cp.id_ag))

Hash IN Join (cost=3532.83..8182.33 rows=32042 width=6) (actual
time=923.456..2457.323 rows=59557 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report (cost=0.00..2613.83 rows=64083 width=10) (actual
time=33.269..1159.024 rows=64083 loops=1)
-> Hash (cost=3323.31..3323.31 rows=32608 width=4) (actual
time=890.021..890.021 rows=0 loops=1)
-> Seq Scan on orders o (cost=21.12..3323.31 rows=32608 width=4)
(actual time=58.428..825.306 rows=60596 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Sort (cost=21.11..21.12 rows=3 width=4) (actual
time=47.612..47.612 rows=1 loops=1)
Sort Key: cp.id_ag
-> Nested Loop (cost=0.00..21.08 rows=3 width=4)
(actual time=47.506..47.516 rows=1 loops=1)
-> Index Scan using users_name_idx on users u
(cost=0.00..5.65 rows=1 width=4) (actual time=20.145..20.148 rows=1 loops=1)
Index Cond: ((name)::text = 'dc'::text)
-> Index Scan using contactpartner_id_user_idx
on contactpartner cp (cost=0.00..15.38 rows=4 width=8) (actual
time=27.348..27.352 rows=1 loops=1)
Index Cond: (cp.id_user = "outer".id)
Total runtime: 456718.658 ms

----- Original Message -----
From: "Steinar H. Gunderson" <sgunderson(at)bigfoot(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Sent: Tuesday, October 11, 2005 3:19 PM
Subject: Re: [PERFORM] Massive delete performance

> On Tue, Oct 11, 2005 at 10:47:03AM +0300, Andy wrote:
>> So, I have a replication only with the tables that I need to send, then I
>> make a copy of this replication, and from this copy I delete all the
>> data's
>> that are not needed.
>>
>> How can I increase this DELETE procedure because it is really slow???
>> There are of corse a lot of data's to be deleted.
>
> Instead of copying and then deleting, could you try just selecting out
> what
> you wanted in the first place?
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>

In response to

Re: Massive delete performance at 2005-10-11 12:19:42 from Steinar H. Gunderson

Responses

Re: Massive delete performance at 2005-10-11 14:17:12 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Sven Willenberger	2005-10-11 13:52:30	Re: Performance on SUSE w/ reiserfs
Previous Message	Steinar H. Gunderson	2005-10-11 12:19:42	Re: Massive delete performance