From: | Markus Bertheau <twanger(at)bluetwanger(dot)de> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | postgres sql list <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: reforming query for 7.0.2 |
Date: | 2002-01-17 12:56:21 |
Message-ID: | 1011272192.15907.24.camel@entwicklung01.cenes.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Josh,
> Markus,
>
> > how do I reform this query to work with 7.0.2?
>
> Better question: Why are you working with 7.0.2? Even the mass-market Linux
> distros (like Red Hat and SuSE) now come with 7.1.x.
Yeah, I know... An upgrade is definetely in the queue. But it will not
happen before 7.2 (if my imagination of its release date is correct), so
we will most probably go to that.
>
> > select * from personen join (select count(personen_id), personen_id from
> > orders group by personen_id) as ordertemp on personen.personen_id =
> > ordertemp.personen_id
>
> Acutally, this query needs some reforming on its own. While it will work, the
> following version will achieve the same result, much faster, in 7.1 (and 7.2,
> for that matter):
>
> SELECT personen.field1, personen.field2, personen.field3, count(order_id) as
> no_orders
> FROM personen LEFT OUTER JOIN orders ON personen.personen_id =
> orders.personen_id
> GROUP BY personen.field1, personen.field2, personen.field3
>
> ... you see, your subselect above is completely unnecessary. And slower than a
> LEFT OUTER JOIN. Simplicity, simplicity, simplicity!
Thank you for that.
> Now, we have the problem of no LEFT JOIN support in 7.0, so:
>
> SELECT personen.field1, personen.field2, personen.field3, count(order_id) as
> no_orders
> FROM personen JOIN orders ON personen.personen_id = orders.personen_id
> GROUP BY personen.field1, personen.field2, personen.field3
> UNION
> SELECT personen.field1, personen.field2, personen.field3, 0 as no_orders
> FROM personen
> WHERE NOT EXISTS (SELECT personen_id FROM orders WHERE personen_id =
> personen.personen_id);
Well, I actually don't need the left outer join, because I don't want
persons included that have no orders. I think the original query didn't
include them. So I turn the left outer join into a join and have the
same query on both versions :-). (We use 7.1.3 on our development
server).
Thanks again.
But let's take a look at explain. The old query shows a much smaller
cost estimate compared to the new one. Both databases are vacuumed on a
regular (daily) basis. First 7.1.3:
cenes_test=> select version();
version
-------------------------------------------------------------
PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)
// old query
cenes_test=> explain select * from personen join (select
count(orders_id), personen_id from orders group by personen_id) as
ordertemp on ordertemp.personen_id = personen.personen_id;
NOTICE: QUERY PLAN:
Nested Loop (cost=11.76..42.32 rows=15 width=272)
-> Subquery Scan ordertemp (cost=11.76..12.50 rows=15 width=8)
-> Aggregate (cost=11.76..12.50 rows=15 width=8)
-> Group (cost=11.76..12.13 rows=147 width=8)
-> Sort (cost=11.76..11.76 rows=147 width=8)
-> Seq Scan on orders (cost=0.00..6.47
rows=147 width=8)
-> Index Scan using personen_pkey on personen (cost=0.00..2.02
rows=1 width=264)
EXPLAIN
// new query
cenes_test=> explain select personen.nachname, personen.vorname,
personen.firma, personen.personen_id, count(orders_id) from personen
join orders on personen.personen_id = orders.personen_id group by
personen.nachname, personen.vorname, personen.firma,
personen.personen_id;
NOTICE: QUERY PLAN:
Aggregate (cost=162.53..164.37 rows=15 width=48)
-> Group (cost=162.53..164.00 rows=147 width=48)
-> Sort (cost=162.53..162.53 rows=147 width=48)
-> Merge Join (cost=139.59..157.24 rows=147 width=48)
-> Sort (cost=127.83..127.83 rows=1265 width=40)
-> Seq Scan on personen (cost=0.00..62.65
rows=1265 width=40)
-> Sort (cost=11.76..11.76 rows=147 width=8)
-> Seq Scan on orders (cost=0.00..6.47
rows=147 width=8)
EXPLAIN
// That's a lot more. About row estimates:
cenes_test=> select count(personen_id) from personen;
count
-------
1272
(1 row)
cenes_test=> select count(orders_id) from orders;
count
-------
189
(1 row)
cenes_test=> select count(orders_id) from orders group by personen_id
order by count desc;
count
-------
41
33
15
12
10
8
7
5
4
3
2
2
2
2
2
2
2
1
// 36 x 1 snipped
(54 rows)
On the production db:
cenes=> select version();
version
---------------------------------------------------------------
PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc 2.95.2
(1 row)
// old query doesn't work
// new query
cenes=> explain select personen.nachname, personen.vorname,
personen.firma, personen.personen_id, count(orders_id) from personen
join orders on personen.personen_id = orders.personen_id group by
personen.nachname, personen.vorname, personen.firma,
personen.personen_id;
NOTICE: QUERY PLAN:
Aggregate (cost=201.91..204.87 rows=24 width=48)
-> Group (cost=201.91..204.28 rows=237 width=48)
-> Sort (cost=201.91..201.91 rows=237 width=48)
-> Merge Join (cost=170.85..192.56 rows=237 width=48)
-> Sort (cost=18.72..18.72 rows=237 width=8)
-> Seq Scan on orders (cost=0.00..9.37
rows=237 width=8)
-> Sort (cost=152.13..152.13 rows=1500 width=40)
-> Seq Scan on personen (cost=0.00..73.00
rows=1500 width=40)
EXPLAIN
cenes=> select count(personen_id) from personen;
count
-------
1501
(1 row)
cenes=> select count(orders_id) from orders;
count
-------
238
(1 row)
count
-------
34
16
12
12
8
7
6
5
4
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
// the rest is 1
(129 rows)
How's that?
Markus Bertheau
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Eckermann | 2002-01-17 15:13:10 | Re: Pattern Matching on Columns |
Previous Message | Bhuvan A | 2002-01-17 05:16:54 | Pattern Matching on Columns |