Re: Finding orphan records

From: Wes <wespvp(at)syntegra(dot)com>
To: Jonel Rienton <jonel(at)rientongroup(dot)com>
Cc: 'Postgresql-General' <pgsql-general(at)postgresql(dot)org>
Subject: Re: Finding orphan records
Date: 2006-01-12 06:43:30
Message-ID: BFEB56B2.80B3%wespvp@syntegra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/12/06 12:23 AM, "Jonel Rienton" <jonel(at)rientongroup(dot)com> wrote:

> Resending sample query, darn where clause didn't wrap
>
> select a.*,b.* from a
> left outer join b on a.id = b.a_id
> where b.id is null;

I tried something along those lines a while back, and it was orders of
magnitude slower. The above produces:

explain select address_key, address from addresses left outer join messages
on addresses.address_key=originator where originator is null;
QUERY PLAN
----------------------------------------------------------------------------
---------------------
Merge Left Join (cost=35684870.14..38457295.97 rows=4090203 width=40)
Merge Cond: ("outer".address_key = "inner".originator)
Filter: ("inner".originator IS NULL)
-> Index Scan using addresses_pkey on addresses (cost=0.00..97213.17
rows=4090203 width=40)
-> Sort (cost=35684870.14..36129462.74 rows=177837040 width=11)
Sort Key: messages.originator
-> Seq Scan on messages (cost=0.00..7215718.40 rows=177837040
width=11)

This appears to be very inefficient. B is almost two orders of magnitude
larger than A. C is about 3-4 times as big as B (record count). My
statement (with the same single 'B' table as above) produces:

narc=> explain select address_key, address from addresses where ( not
exists(select 1 from messages where originator=addresses.address_key limit
1) );
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------
Seq Scan on addresses (cost=0.00..3398462.98 rows=2045102 width=40)
Filter: (NOT (subplan))
SubPlan
-> Limit (cost=0.00..0.81 rows=1 width=0)
-> Index Scan using messages_i_orig_mdate on messages
(cost=0.00..35148.46 rows=43301 width=0)
Index Cond: (originator = $0)

Which seems like it should be much more efficient.

Wes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2006-01-12 07:53:42 Re: sql (Stored procedure) design question
Previous Message Jonel Rienton 2006-01-12 06:23:02 Re: Finding orphan records