Large Table - Slow Window Functions (Better Approach?)

From: Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov>
To: pgsql-performance(at)postgresql(dot)org
Subject: Large Table - Slow Window Functions (Better Approach?)
Date: 2013-03-11 14:27:17
Message-ID: CA+BdxK9ko0L=_2aEimOsOUiOadxG3wE8TyzT7dOms9FF0ORyOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

I have a large table (~90 million rows) containing vessel positions. In
addition to a column that contains the location information (the_geom), the
table also contains two columns that are used to uniquely identify the
vessel (mmsi and name) and a column containing the Unix time (epoch) at
which the position information was logged. I frequently need to assign
records to vessel transits. To do this, I currently create a CTE that uses
a Window function (partitioning the data by mmsi and name ordered by epoch)
to examine the time that has elapsed between successive position reports
for individual vessels. For every position record for a vessel (as
identified using mmsi and name), if the time elapsed between the current
position record and the previous record (using the lag function) is less
than or equal to 2 hours, I assign the record a value of 0 to a CTE column
named tr_index. If the time elapsed is greater than 2 hours, I assign the
record a value of 1 to the tr_index column. I then use the CTE to generate
transit numbers by summing the values in the tr_index field across a Window
that also partitions the data by mmsi and name and is ordered by epoch.
This works, but is very slow (hours). The table is indexed (multi-column
index on mmsi, name and index on epoch). Does anyone see a way to get what
I am after in a more efficient manner. What I am after is an assignment of
transit number to vessels' position records based on whether the records
were within two hours of each other. The SQL that I used is provided below.
Any advice would be greatly appreciated...

WITH

cte_01 AS

(

SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom

CASE

WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1

ELSE 0

END AS tr_index

FROM table a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

)

SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom,

1 + sum(a.tr_index) OVER w AS transit,

a.active

FROM cte_01 a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

--
Jeff

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Nelson 2013-03-11 14:47:06 Re: sniff test on some PG 8.4 numbers
Previous Message Greg Smith 2013-03-11 04:28:18 Re: sniff test on some PG 8.4 numbers