Re: [SQL] can i make this sql query more efficiant?

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "Tomasz Myrta" <jasiek(at)klaster(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] can i make this sql query more efficiant?
Date: 2003-04-04 19:03:08
Message-ID: d0ir8vgn2sp6uh2h61nkm4tdasqqn4o18l@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-sql

On Fri, 4 Apr 2003 08:16:01 -0800, Josh Berkus <josh(at)agliodbs(dot)com>
wrote:
>That version is only more efficient for small data sets. I've generally
>found that case statements are slower than subselects for large data sets.

I'd be honestly interested in the circumstances where you made that
observation.

>YMMV.

Yes, it does :-) Out of curiosity I did a few tests with PG 7.2 on my
old notebook:

CREATE TABLE baz (event int, level int);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5);
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
...
INSERT INTO baz SELECT (100*random()+0.5), (3*random()+0.5) FROM baz;
CREATE INDEX baz_event ON baz(event);
ANALYSE baz;

SELECT event,
SUM (CASE level WHEN 1 THEN 1 ELSE 0 END) AS ones,
SUM (CASE level WHEN 2 THEN 1 ELSE 0 END) AS twos,
SUM (CASE level WHEN 3 THEN 1 ELSE 0 END) AS threes
FROM baz GROUP BY event;

SELECT event,
(SELECT count(*) FROM baz a
WHERE level = 1 AND a.event=baz.event) AS ones,
(SELECT count(*) FROM baz a
WHERE level = 2 and a.event=baz.event) AS twos,
(SELECT count(*) FROM baz a
WHERE level = 3 and a.event=baz.event) AS threes
FROM baz GROUP BY event;

tuples case subselect
8K 718.48 msec 16199.88 msec
32K 6168.18 msec 74742.85 msec
128K 25072.34 msec 304585.61 msec

CLUSTER baz_event ON baz; ANALYSE baz;
This changes the subselect plan from seq scan to index scan.

128K 12116.07 msec 17530.85 msec

Add 128K more tuples, so that only the first half of the relation is
clustered.

256K 45663.35 msec 117748.23 msec

CLUSTER baz_event ON baz; ANALYSE baz;

256K 23691.81 msec 35138.26 msec

Maybe it is just the data distribution (100 events, 3 levels,
thousands of tuples) that makes CASE look faster than subselects ...

Servus
Manfred

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Lamar Owen 2003-04-04 19:08:31 Re: [PERFORM] OSS database needed for testing
Previous Message Josh Berkus 2003-04-04 18:07:49 Re: OSS database needed for testing

Browse pgsql-sql by date

  From Date Subject
Next Message Josh Berkus 2003-04-04 19:26:14 Re: [SQL] can i make this sql query more efficiant?
Previous Message Josh Berkus 2003-04-04 16:16:01 Re: can i make this sql query more efficiant?