From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: two count columns? |
Date: | 2006-02-18 00:51:27 |
Message-ID: | 87mzgp1oc0.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com> writes:
> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip
select from_ip,
count(from_ip) as entries,
(select count(*)
from log as l
where l.from_ip = log.from_ip
) as tot_entries,
max(ts) as last_access
from log
where to_port=22
and direction='in
group by from_ip
expect it to be pretty slow though. For every from_ip it has to look up every
other entry with that from_ip.
> Thankful for any hints or tips.
There is a trick you could use to make it faster but it gets cumbersome and
pretty tricky to use when you're doing more than one thing at a time:
select from_ip
sum(case when to_port=22 and direction='in' then 1 else 0 end) as entries,
count(*) as tot_entries,
max(case when to_port=22 and direction='in' then ts::timestamp(0) else null end) as last_access
from log
group by from_ip
having entries > 0
Note that in either case you might want to look at ANALYZE results for the
query and try raising work_mem for this query using SET until you see the plan
using a hash aggregate. If it can use a hash aggregate for your query (more
likely for the first query than the second) without swapping it'll be faster
than sorting.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2006-02-18 07:36:25 | Re: view of weekly data |
Previous Message | Tom Lane | 2006-02-18 00:47:22 | Re: two count columns? |