From: | Julian Scarfe <jscarfe(at)callnetuk(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-sql <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: [SQL] DISTINCT ON: speak now or forever hold your peace |
Date: | 2000-01-25 13:23:58 |
Message-ID: | B4B3546E.5A80%jscarfe@callnetuk.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-sql |
on 25/1/00 10:59, Peter Eisentraut at e99re41(at)DoCS(dot)UU(dot)SE wrote:
> On Tue, 25 Jan 2000, Julian Scarfe wrote:
>> I use 'distinct on' to get the most recent reports for each of a group of
>> locations. E.g.:
>>
>> create table reports (
>> location varchar(16),
>> report_time datetime,
>> report_text text);
>>
>> select distinct on location * from reports where location ~~ 'Lond%' order by
>> location, reporttime desc;<pgsql-hackers(at)postgreSQL(dot)org>
>>
>> to get the tuples that offer the most recent reports for each of London,
>> Londonderry, Londy etc.
>>
>> Is there an alternative?
>
> select location, max(report_time) from reports group by location
>
> This also has the negligible advantage that it gives you determinate
> results.
But I want report_text *too* -- that's the important bit, and it's not an
aggregate or common column in the group. So:
> Am I wrong or can DISTINCT ON not be rewritten in terms of GROUP BY in all
> cases?
I don't think so.
It has to be something like:
select * from reports as r1 where r1.report_time = (select max(report_time)
from reports as r2 where r2.location = r1.location) and r1.location ~~
'Lond%';
However, to my surprise, that's as quick as doing the 'distinct on'.
host=> explain select * from reports as r1 where r1.report_time = (select
max(report_time) from reports as r2 where r2.location = r1.location) and
r1.location ~~ 'Lond%';
NOTICE: QUERY PLAN:
Index Scan using reports_by_location_issuetime on reports r1 (cost=186.24
size=2 width=334)
SubPlan
-> Aggregate (cost=3.62 size=0 width=0)
-> Index Scan using reports_by_location_report_time on reports r2
(cost=3.62 size=13 width=8)
EXPLAIN
host=> explain select distinct on location * from reports where location ~~
'Lond%' order by location, report_time desc;
NOTICE: QUERY PLAN:
Unique (cost=186.24 size=0 width=0)
-> Sort (cost=186.24 size=0 width=0)
-> Index Scan using reports_by_location_issuetime on reports
(cost=186.24 size=2 width=334)
I'm surprise that the "reports_by_location_issuetime" index got used, since
it involves the issuetime field which is not the same as the report_time
field. I edited it and numerous other fields from the quoted 'create table'
for simplicity. There's also a "reports_by_location_report_time" which
indexes on, obviously, (location, report_time). Comments?
Julian Scarfe, learning...
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2000-01-25 14:14:29 | Re: [HACKERS] Happy column dropping |
Previous Message | Peter Eisentraut | 2000-01-25 11:01:22 | Re: [HACKERS] Well, then you keep your darn columns |
From | Date | Subject | |
---|---|---|---|
Next Message | Don Baccus | 2000-01-25 15:41:38 | Re: [HACKERS] DISTINCT ON: speak now or forever hold your peace |
Previous Message | Palle Girgensohn | 2000-01-25 11:29:07 | Re: [SQL] Problem with large tuples. |