Quick Links

Re: two count columns?

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: two count columns?
Date:	2006-02-18 00:51:27
Message-ID:	87mzgp1oc0.fsf@stark.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Jan Danielsson <jan(dot)danielsson(at)gmail(dot)com> writes:

> select from_ip, count(from_ip) as entries, count(select * from log where
> ...) as tot_entries, max(ts)::timestamp(0) as last_access from log where
> to_port=22 and direction='in' group by from_ip

select from_ip,
count(from_ip) as entries,
(select count(*)
from log as l
where l.from_ip = log.from_ip
) as tot_entries,
max(ts) as last_access
from log
where to_port=22
and direction='in
group by from_ip

expect it to be pretty slow though. For every from_ip it has to look up every
other entry with that from_ip.

> Thankful for any hints or tips.

There is a trick you could use to make it faster but it gets cumbersome and
pretty tricky to use when you're doing more than one thing at a time:

select from_ip
sum(case when to_port=22 and direction='in' then 1 else 0 end) as entries,
count(*) as tot_entries,
max(case when to_port=22 and direction='in' then ts::timestamp(0) else null end) as last_access
from log
group by from_ip
having entries > 0

Note that in either case you might want to look at ANALYZE results for the
query and try raising work_mem for this query using SET until you see the plan
using a hash aggregate. If it can use a hash aggregate for your query (more
likely for the first query than the second) without swapping it'll be faster
than sorting.

--
greg

In response to

two count columns? at 2006-02-17 23:56:52 from Jan Danielsson

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Greg Stark	2006-02-18 07:36:25	Re: view of weekly data
Previous Message	Tom Lane	2006-02-18 00:47:22	Re: two count columns?