Re: Scrub one large table against another

From: Brendan Curran <brendan(dot)curran(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Scrub one large table against another
Date: 2006-10-10 22:37:00
Message-ID: 452C208C.5010603@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Brendan Curran <brendan(dot)curran(at)gmail(dot)com> writes:
>> CREATE TEMP TABLE temp_list_suppress(email_record_id int8);
>
>> INSERT INTO temp_list_suppress
>> SELECT email_record_id from ONLY email_record er
>> WHERE email_list_id = 9 AND email IN
>> (select email from suppress);
>
>> CREATE INDEX unique_id_index on temp_list_suppress ( email_record_id );
>
>> INSERT INTO er_banned
>> SELECT * from ONLY email_record er WHERE EXISTS
>> (SELECT 1 from temp_list_suppress ts where er.email_record_id =
>> ts.email_record_id)';
>
>> DELETE FROM ONLY email_record WHERE email_list_id = 9 AND email_record_id IN
>> (SELECT email_record_id from temp_list_suppress);
>
>> TRUNCATE TABLE temp_list_suppress;
>> DROP TABLE temp_list_suppress;
>
>> The performance is dreadful, is there a more efficient way to do this?
>
> Have you tried doing EXPLAIN ANALYZE of each of the INSERT/DELETE steps?
> If you don't even know which part is slow, it's hard to improve.

FIRST INSERT (Just the select is explained):
Hash Join (cost=8359220.68..9129843.00 rows=800912 width=32)
Hash Cond: (("outer".email)::text = ("inner".email)::text)
-> Unique (cost=4414093.19..4522324.49 rows=21646260 width=25)
-> Sort (cost=4414093.19..4468208.84 rows=21646260 width=25)
Sort Key: suppress.email
-> Seq Scan on suppress (cost=0.00..393024.60
rows=21646260 width=25)
-> Hash (cost=3899868.47..3899868.47 rows=4606808 width=32)
-> Bitmap Heap Scan on email_record er
(cost=38464.83..3899868.47 rows=4606808 width=32)
Recheck Cond: (email_list_id = 13)
-> Bitmap Index Scan on list (cost=0.00..38464.83
rows=4606808 width=0)
Index Cond: (email_list_id = 13)

SECOND INSERT (Using EXISTS):
Seq Scan on email_record er (cost=0.00..381554175.29 rows=62254164
width=1863)
Filter: (subplan)
SubPlan
-> Index Scan using er_primeq_pk on er_primeq eq (cost=0.00..3.03
rows=1 width=0)
Index Cond: ($0 = email_record_id)

SECOND INSERT (Using IN):
Nested Loop (cost=26545.94..2627497.28 rows=27134 width=1863)
-> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8)
-> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355
width=8)
-> Index Scan using email_record_pkey on email_record er
(cost=0.00..3.52 rows=1 width=1863)
Index Cond: (er.email_record_id = "outer".email_record_id)
Filter: (email_list_id = 13)

DELETE
Nested Loop (cost=26545.94..2627497.28 rows=50846 width=6)
-> HashAggregate (cost=26545.94..33879.49 rows=733355 width=8)
-> Seq Scan on er_primeq (cost=0.00..24712.55 rows=733355
width=8)
-> Index Scan using email_record_pkey on email_record
(cost=0.00..3.52 rows=1 width=14)
Index Cond: (email_record.email_record_id =
"outer".email_record_id)
Filter: (email_list_id = 9)

To get this explain data I used a sample "temp_suppress" table that
contained about 700k rows and was indexed but not analyzed...

>
> It would probably help to do an "ANALYZE temp_list_suppress" right after
> populating the temp table. As you have it, the second insert and delete
> are being planned with nothing more than a row count (obtained during
> CREATE INDEX) and no stats about distribution of the table contents.
>
> Also, I'd be inclined to try replacing the EXISTS with an IN test;
> in recent PG versions the planner is generally smarter about IN.
> (Is there a reason why you are doing the INSERT one way and the
> DELETE the other?)
>
> BTW, that TRUNCATE right before the DROP seems quite useless,
> although it's not the main source of your problem.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-10-10 22:47:07 Re: Scrub one large table against another
Previous Message Tom Lane 2006-10-10 22:14:05 Re: Scrub one large table against another