help optimizing query

From: George Nychis <gnychis(at)cmu(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: help optimizing query
Date: 2008-03-22 18:48:07
Message-ID: 47E55467.6070800@cmu.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I'm looking for helping optimizing a query. It currently requires two
passes on the data per query, when I'd like to try and write it to only
require a single pass.

Here's the high level, it's parsing flow level network traffic and it's
interested in 4 fields:
src_ip, dst_ip, src_packets, dst_packets

In words:
'src_ip' sent 'src_packets' number of packets to 'dst_ip'
'dst_ip' sent 'dst_packets' number of packets to 'src_ip'

For every IP address, I want to count how many packets were sent to it.
This could come one of two ways, the IP is the source in the flow, and
it received dst_packets. Or, the IP is the destination in the flow, and
it received src_packets.

My current method is to split that into two queries and then take the
union. But, I was wondering if its possible to do this in one pass
through the data?

SELECT ip,sum(dst_packets)
FROM(
(SELECT dst_ip AS ip,sum(src_packets) AS dst_packets
FROM flows
WHERE interval='2005-02-01 00:00:00'
GROUP BY dst_ip)
UNION ALL
(SELECT src_ip AS ip,sum(dst_packets) AS dst_packets
FROM flows
WHERE interval='2005-02-01 00:00:00'
GROUP BY src_ip) )
AS stats
GROUP BY stats.ip
HAVING sum(dst_packets)>0
ORDER BY sum(dst_packets) DESC

- George

Browse pgsql-general by date

  From Date Subject
Next Message Michael Meskes 2008-03-22 20:11:12 Re: --enable-thread-safety bug
Previous Message Ivan Sergio Borgonovo 2008-03-22 18:35:52 Re: dynamically generated SQL and planner/performance