Re: two count columns?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: two count columns?
Date: 2006-02-18 00:51:27
Message-ID: 87mzgp1oc0.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com> writes:

> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip

select from_ip,
count(from_ip) as entries,
(select count(*)
from log as l
where l.from_ip = log.from_ip
) as tot_entries,
max(ts) as last_access
from log
where to_port=22
and direction='in
group by from_ip

expect it to be pretty slow though. For every from_ip it has to look up every
other entry with that from_ip.

> Thankful for any hints or tips.

There is a trick you could use to make it faster but it gets cumbersome and
pretty tricky to use when you're doing more than one thing at a time:

select from_ip
sum(case when to_port=22 and direction='in' then 1 else 0 end) as entries,
count(*) as tot_entries,
max(case when to_port=22 and direction='in' then ts::timestamp(0) else null end) as last_access
from log
group by from_ip
having entries > 0

Note that in either case you might want to look at ANALYZE results for the
query and try raising work_mem for this query using SET until you see the plan
using a hash aggregate. If it can use a hash aggregate for your query (more
likely for the first query than the second) without swapping it'll be faster
than sorting.

--
greg

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Greg Stark 2006-02-18 07:36:25 Re: view of weekly data
Previous Message Tom Lane 2006-02-18 00:47:22 Re: two count columns?