RE: Improve a query...

From: Jeff Eckermann <jeckermann(at)verio(dot)net>
To: "'Eric G(dot) Miller'" <egm2(at)jps(dot)net>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: RE: Improve a query...
Date: 2001-05-02 14:48:43
Message-ID: 08CD1781F85AD4118E0800A0C9B8580B094A5F@NEZU
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Try:
SELECT DISTINCT ON (org_id) rpt_id, org_id, period,...
FROM reports
ORDER BY period DESC;

> -----Original Message-----
> From: Eric G. Miller [SMTP:egm2(at)jps(dot)net]
> Sent: Tuesday, May 01, 2001 4:04 AM
> To: PostgreSQL General
> Subject: [GENERAL] Improve a query...
>
> Looking for the best way to formulate a query to select
> the most "recent" entry for an organization in a table
> like:
>
> CREATE TABLE reports (
>
> -- Report Id used to link up related 1:M rather than multi-key
> rpt_id SERIAL NOT NULL PRIMARY KEY,
>
> -- A Unique ID for the organization
> org_id char(10) NOT NULL CHECK(CHARACTER_LENGTH(op_id) = 10),
>
> -- The reporting period
> period integer NOT NULL
>
> -- Various and Sundry ...
> .
> .
> .
>
> UNIQUE (org_id,period)
> );
>
> If max(period) for an organization yields the most recent reporting,
> I want to get whatever is the most recent report for each organization.
>
> This query works, but seems expensive...
>
> SELECT a.rpt_id, a.org_id, a.period, ...
> FROM reports As a
> INNER JOIN
> (SELECT b.org_id, max(b.period) As period
> FROM reports b group by b.org_id) As c
> ON a.org_id = c.org_id and a.period = c.period;
>
> EXPLAIN looks thusly:
>
> NOTICE: QUERY PLAN:
>
> Merge Join (cost=147.98..164.48 rows=10 width=48)
> -> Sort (cost=69.83..69.83 rows=1000 width=32)
> -> Seq Scan on reports a (cost=0.00..20.00 rows=1000 width=32)
> -> Sort (cost=78.15..78.15 rows=100 width=16)
> -> Subquery Scan c (cost=69.83..74.83 rows=100 width=16)
> -> Aggregate (cost=69.83..74.83 rows=100 width=16)
> -> Group (cost=69.83..72.33 rows=1000 width=16)
> -> Sort (cost=69.83..69.83 rows=1000 width=16)
> -> Seq Scan on reports b
> (cost=0.00..20.00
> rows=1000 width=16)
>
>
> The data is very hierarchical so I didn't want to carry around alot of
> key fields in related "many-sided" tables which may also have related
> "many-sided" tables. Any ideas on how to minimize the multiple
> scans on the table? The numbers for explain probably aren't telling
> much since there's not much real data in the table at this time...
>
> --
> Eric G. Miller <egm2(at)jps(dot)net>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Meeks 2001-05-02 14:58:39 RE: Joining more than 2 tables
Previous Message Vivek Khera 2001-05-02 14:47:50 best way to implement producer/consumer in Perl