Re: UNION versus SUB SELECT

From: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: desmodemone <desmodemone(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: UNION versus SUB SELECT
Date: 2013-11-22 17:36:45
Message-ID: CAAXGW-xj0mCFTFtE=BqwCv-8t=3OF1bB+_vgft8kHLSy7y-D_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

So far that one was the worst in terms of cost and time. Here are all the
plans with buffers, more records, and results being returned. At this point
I have good enough performance with my UNION approach but I'm just trying
to learn now. WHY is the union approach the fastest? I would have expected
the EXISTS or IN approaches to be faster or at least have the SAME cost? At
this point I just want to understand.

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select *
from contacts c
where EXISTS(
(select 1 from phone_numbers p where c.id = p.contact_id and p.national
= 5038904993 and p.e164 = '+15038904993')
union
(select 1 id = e.contact_id and e.email = 'robert(dot)xxxx(at)gmail(dot)com'));

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.contacts c (cost=0.00..23238.90 rows=1425 width=95)
(actual time=2.241..46.817 rows=7 loops=1)
Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
Filter: (SubPlan 1)
Rows Removed by Filter: 2843
Buffers: shared hit=11497
SubPlan 1
-> Unique (cost=8.13..8.13 rows=2 width=0) (actual time=0.015..0.015
rows=0 loops=2850)
Output: (1)
Buffers: shared hit=11440
-> Sort (cost=8.13..8.13 rows=2 width=0) (actual
time=0.013..0.013 rows=0 loops=2850)
Output: (1)
Sort Key: (1)
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=11440
-> Append (cost=0.06..8.13 rows=2 width=0) (actual
time=0.009..0.009 rows=0 loops=2850)
Buffers: shared hit=11440
-> Index Only Scan using idx_phone on
public.phone_numbers p (cost=0.06..4.06 rows=1 width=0) (actual
time=0.003..0.003 rows=0 loops=2850)
Output: 1
Index Cond: ((p.contact_id = c.id) AND
(p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
Heap Fetches: 11
Buffers: shared hit=5721
-> Index Only Scan using idx_email_full on
public.email_addresses e (cost=0.06..4.06 rows=1 width=0) (actual
time=0.003..0.003 rows=0 loops=2850)
Output: 1
Index Cond: ((e.contact_id = c.id) AND
(e.email = 'robert(dot)xxxx(at)gmail(dot)com'::text))
Heap Fetches: 5
Buffers: shared hit=5719
Total runtime: 46.897 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select *
from contacts where id IN (
(select c.id
from contacts c join phone_numbers p on c.id = p.contact_id and
p.national = 5038904993 and p.e164 = '+15038904993')
union
(select c.id from contacts c join email_addresses e on c.id =
e.contact_id and e.email = 'robert(dot)xxxx(at)gmail(dot)com'));

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=29.38..53.74 rows=6 width=95) (actual time=0.356..0.418
rows=7 loops=1)
Output: contacts.id, contacts.owner_id, contacts.user_id,
contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
contacts.fb_id
Buffers: shared hit=87
-> HashAggregate (cost=29.32..29.34 rows=6 width=8) (actual
time=0.347..0.354 rows=7 loops=1)
Output: c.id
Buffers: shared hit=66
-> Append (cost=0.11..29.32 rows=6 width=8) (actual
time=0.047..0.316 rows=16 loops=1)
Buffers: shared hit=66
-> Nested Loop (cost=0.11..8.12 rows=1 width=8) (actual
time=0.045..0.169 rows=11 loops=1)
Output: c.id
Buffers: shared hit=43
-> Index Scan using idx_phone_address on
public.phone_numbers p (cost=0.06..4.06 rows=1 width=8) (actual
time=0.027..0.047 rows=11 loops=1)
Output: p.id, p.contact_id, p."national",
p.e164, p.raw_number
Index Cond: ((p."national" = 5038904993::bigint)
AND ((p.e164)::text = '+15038904993'::text))
Buffers: shared hit=9
-> Index Only Scan using idx_contacts_pkey_owner on
public.contacts c (cost=0.06..4.06 rows=1 width=8) (actual
time=0.005..0.006 rows=1 loops=11)
Output: c.id, c.owner_id, c.user_id
Index Cond: (c.id = p.contact_id)
Heap Fetches: 11
Buffers: shared hit=34
-> Nested Loop (cost=2.12..21.17 rows=5 width=8) (actual
time=0.057..0.114 rows=5 loops=1)
Output: c_1.id
Buffers: shared hit=23
-> Bitmap Heap Scan on public.email_addresses e
(cost=2.06..8.85 rows=5 width=8) (actual time=0.044..0.055 rows=5 loops=1)
Output: e.id, e.contact_id, e.email
Recheck Cond: ((e.email)::text = '
robert(dot)xxxx(at)gmail(dot)com'::text)
Buffers: shared hit=7
-> Bitmap Index Scan on idx_email_address
(cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
Index Cond: ((e.email)::text = '
robert(dot)xxxx(at)gmail(dot)com'::text)
Buffers: shared hit=2
-> Index Only Scan using idx_contacts_pkey_owner on
public.contacts c_1 (cost=0.06..2.46 rows=1 width=8) (actual
time=0.005..0.006 rows=1 loops=5)
Output: c_1.id, c_1.owner_id, c_1.user_id
Index Cond: (c_1.id = e.contact_id)
Heap Fetches: 5
Buffers: shared hit=16
-> Index Scan using idx_contacts_pkey_owner on public.contacts
(cost=0.06..4.06 rows=1 width=95) (actual time=0.003..0.004 rows=1 loops=7)
Output: contacts.id, contacts.owner_id, contacts.user_id,
contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype,
contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname,
contacts.fb_id
Index Cond: (contacts.id = c.id)
Buffers: shared hit=21
Total runtime: 0.535 ms
(40 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select c.*
from contacts c
where EXISTS(select 1 from phone_numbers p where (p.national = 5038904993
and p.e164 = '+15038904993') and p.contact_id = c.id)
or EXISTS(select 1 from email_addresses e where e.email = '
robert(dot)xxxx(at)gmail(dot)com' and e.contact_id = c.id);
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.contacts c (cost=0.00..23213.25 rows=2138 width=95)
(actual time=0.209..1.290 rows=7 loops=1)
Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives:
SubPlan 3 or hashed SubPlan 4))
Rows Removed by Filter: 2843
Buffers: shared hit=73
SubPlan 1
-> Index Only Scan using idx_phone on public.phone_numbers p
(cost=0.06..4.06 rows=1 width=0) (never executed)
Index Cond: ((p.contact_id = c.id) AND (p."national" =
5038904993::bigint) AND (p.e164 = '+15038904993'::text))
Heap Fetches: 0
SubPlan 2
-> Index Scan using idx_phone_address on public.phone_numbers p_1
(cost=0.06..4.06 rows=1 width=8) (actual time=0.033..0.056 rows=11 loops=1)
Output: p_1.contact_id
Index Cond: ((p_1."national" = 5038904994::bigint) AND
((p_1.e164)::text = '+15038904993'::text))
Buffers: shared hit=9
SubPlan 3
-> Index Only Scan using idx_email_full on public.email_addresses e
(cost=0.06..4.06 rows=1 width=0) (never executed)
Index Cond: ((e.contact_id = c.id) AND (e.email = '
robert(dot)xxxx(at)gmail(dot)com'::text))
Heap Fetches: 0
SubPlan 4
-> Bitmap Heap Scan on public.email_addresses e_1 (cost=2.06..8.85
rows=5 width=8) (actual time=0.040..0.050 rows=5 loops=1)
Output: e_1.contact_id
Recheck Cond: ((e_1.email)::text = 'robert(dot)xxxx(at)gmail(dot)com'::text)
Buffers: shared hit=7
-> Bitmap Index Scan on idx_email_address (cost=0.00..2.06
rows=5 width=0) (actual time=0.030..0.030 rows=6 loops=1)
Index Cond: ((e_1.email)::text = 'robert(dot)xxxx(at)gmail(dot)com
'::text)
Buffers: shared hit=2
Total runtime: 1.395 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose)
select c.*
from contacts c
where c.id IN (
select p.contact_id from phone_numbers p where (p.national = 5038904993
and p.e164 = '+15038904993'))
or c.id IN (
select e.contact_id from email_addresses e where e.email = '
robert(dot)xxxx(at)gmail(dot)com');
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.contacts c (cost=12.92..81.32 rows=2138 width=95)
(actual time=0.208..1.283 rows=7 loops=1)
Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call,
c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
Rows Removed by Filter: 2843
Buffers: shared hit=73
SubPlan 1
-> Index Scan using idx_phone_address on public.phone_numbers p
(cost=0.06..4.06 rows=1 width=8) (actual time=0.032..0.054 rows=11 loops=1)
Output: p.contact_id
Index Cond: ((p."national" = 5038904993::bigint) AND
((p.e164)::text = '+15038904993'::text))
Buffers: shared hit=9
SubPlan 2
-> Bitmap Heap Scan on public.email_addresses e (cost=2.06..8.85
rows=5 width=8) (actual time=0.040..0.049 rows=5 loops=1)
Output: e.contact_id
Recheck Cond: ((e.email)::text = 'robert(dot)xxxx(at)gmail(dot)com'::text)
Buffers: shared hit=7
-> Bitmap Index Scan on idx_email_address (cost=0.00..2.06
rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
Index Cond: ((e.email)::text = 'robert(dot)xxxx(at)gmail(dot)com
'::text)
Buffers: shared hit=2
Total runtime: 1.371 ms
(19 rows)

dft1fjfv106r48=>

On Fri, Nov 22, 2013 at 7:54 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Thu, Nov 21, 2013 at 2:31 PM, desmodemone <desmodemone(at)gmail(dot)com>
> wrote:
> > Hi Robert, could you try with "exists" ?
> >
> > SELECT c.*
> > FROM contacts c
> > WHERE exists ( SELECT 1 FROM phone p WHERE p.addr =? and
> > p.contact_id=c.id )
> > OR exists (SELECT 1 FROM email e WHERE e.addr = ? and e.contact_id=
> c.id );
>
> hm, how about:
> SELECT c.*
> FROM contacts c
> WHERE exists (
> SELECT 1
> FROM phone p
> WHERE p.addr =? AND p.contact_id=c.id
> UNION ALL
> SELECT 1 FROM email e
> WHERE e.addr = ? AND e.contact_id=c.id
> );
>
> merlin
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Lee Nguyen 2013-11-25 20:01:53 Postgresql in a Virtual Machine
Previous Message Merlin Moncure 2013-11-22 15:54:29 Re: UNION versus SUB SELECT