Join vs Subquery

From: Brian Herlihy <btherl(at)yahoo(dot)com(dot)au>
To: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Join vs Subquery
Date: 2007-05-03 06:07:57
Message-ID: 243066.5712.qm@web52308.mail.re2.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am using postgres 8.1.3 for this. If this has been dealt with later, please disregard. And this is not a complaint or a request, I am just curious, so I know how to best construct my queries.

There is a unique index mapping domains to domain_ids.

views_ts specifies a partitioned table, where views_ts_2007_04_01 is the only partition matching the range given in the query.

My goal is to produce summaries of counts of rows for each day within a given range (can be days, months, years).

The issue: the second query results in a lower cost estimate. I am wondering why the second query plan was not chosen for the first query.

Thanks!
Brian

live=> explain select ts::date,count(*) from views_ts join domains using (domain_id) where domain = '1234.com' and ts >= '2007-04-01' and ts < '2007-04-02' group by ts::date;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=9040.97..9041.00 rows=2 width=8)
-> Hash Join (cost=6.01..9040.96 rows=2 width=8)
Hash Cond: ("outer".domain_id = "inner".domain_id)
-> Append (cost=0.00..7738.01 rows=259383 width=16)
-> Seq Scan on views_ts (cost=0.00..1138.50 rows=1 width=16)
Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
-> Seq Scan on views_ts_2007_04_01 views_ts (cost=0.00..6599.51 rows=259382 width=16)
Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
-> Hash (cost=6.01..6.01 rows=1 width=8)
-> Index Scan using domains_domain on domains (cost=0.00..6.01 rows=1 width=8)
Index Cond: ("domain" = '1234.com'::text)
(11 rows)

live=> explain select ts::date,count(*) from views_ts where domain_id = (select domain_id from domains where domain = '1234.com') and ts >= '2007-04-01' and ts < '2007-04-02' group by ts::date;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1993.93..1995.99 rows=137 width=8)
InitPlan
-> Index Scan using domains_domain on domains (cost=0.00..6.01 rows=1 width=8)
Index Cond: ("domain" = '1234.com'::text)
-> Result (cost=0.00..1986.69 rows=247 width=8)
-> Append (cost=0.00..1986.07 rows=247 width=8)
-> Seq Scan on views_ts (cost=0.00..1245.75 rows=1 width=8)
Filter: ((domain_id = $0) AND (ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
-> Bitmap Heap Scan on views_ts_2007_04_01 views_ts (cost=2.86..740.32 rows=246 width=8)
Recheck Cond: (domain_id = $0)
Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
-> Bitmap Index Scan on views_ts_2007_04_01_domain_id (cost=0.00..2.86 rows=246 width=0)
Index Cond: (domain_id = $0)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2007-05-03 09:18:27 Re: Intermitent slow queries
Previous Message Parks, Aaron B. 2007-05-02 19:07:00 Re: Intermitent slow queries