From: | Wes <wespvp(at)syntegra(dot)com> |
---|---|
To: | Jonel Rienton <jonel(at)rientongroup(dot)com> |
Cc: | 'Postgresql-General' <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Finding orphan records |
Date: | 2006-01-12 06:43:30 |
Message-ID: | BFEB56B2.80B3%wespvp@syntegra.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 1/12/06 12:23 AM, "Jonel Rienton" <jonel(at)rientongroup(dot)com> wrote:
> Resending sample query, darn where clause didn't wrap
>
> select a.*,b.* from a
> left outer join b on a.id = b.a_id
> where b.id is null;
I tried something along those lines a while back, and it was orders of
magnitude slower. The above produces:
explain select address_key, address from addresses left outer join messages
on addresses.address_key=originator where originator is null;
QUERY PLAN
----------------------------------------------------------------------------
---------------------
Merge Left Join (cost=35684870.14..38457295.97 rows=4090203 width=40)
Merge Cond: ("outer".address_key = "inner".originator)
Filter: ("inner".originator IS NULL)
-> Index Scan using addresses_pkey on addresses (cost=0.00..97213.17
rows=4090203 width=40)
-> Sort (cost=35684870.14..36129462.74 rows=177837040 width=11)
Sort Key: messages.originator
-> Seq Scan on messages (cost=0.00..7215718.40 rows=177837040
width=11)
This appears to be very inefficient. B is almost two orders of magnitude
larger than A. C is about 3-4 times as big as B (record count). My
statement (with the same single 'B' table as above) produces:
narc=> explain select address_key, address from addresses where ( not
exists(select 1 from messages where originator=addresses.address_key limit
1) );
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------
Seq Scan on addresses (cost=0.00..3398462.98 rows=2045102 width=40)
Filter: (NOT (subplan))
SubPlan
-> Limit (cost=0.00..0.81 rows=1 width=0)
-> Index Scan using messages_i_orig_mdate on messages
(cost=0.00..35148.46 rows=43301 width=0)
Index Cond: (originator = $0)
Which seems like it should be much more efficient.
Wes
From | Date | Subject | |
---|---|---|---|
Next Message | Tino Wildenhain | 2006-01-12 07:53:42 | Re: sql (Stored procedure) design question |
Previous Message | Jonel Rienton | 2006-01-12 06:23:02 | Re: Finding orphan records |