Re: Large Table - Slow Window Functions (Better Approach?)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large Table - Slow Window Functions (Better Approach?)
Date: 2013-03-11 15:34:17
Message-ID: CAFj8pRAS+A9pOLF+fgW=QWaa9kCJ+FfOvVEXDYwAbowShA0UEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2013/3/11 Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov>:
> Pavel,
>
> Thanks for the response. I have not yet had the opportunity to use cursors,
> but am now curious. Could you perhaps provide a bit more detail as to what
> the implementation of your suggested approach would look like?

an example:

$$
DECLARE
r record;
prev_r record;

BEGIN
FOR r IN SELECT * FROM a ORDER BY epoch, mmsi
LOOP
IF prev_r IS NOT NULL THEN
/* do some counting */
prev_r contains previous row, r contains current row
do some
RETURN NEXT .. /* return data in defined order */
END IF;
prev_r = r;
END LOOP;

Probably slow part of your query is sorting - first can be accelerated
by index, but second (as CTE result cannot) - you can try increase
work_mem ??

Regards

Pavel

>
>
> On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>>
>> Hello
>>
>> you can try procedural solution - use a cursor over ordered data in
>> plpgsql and returns table
>>
>> Regards
>>
>> Pavel Stehule
>>
>> 2013/3/11 Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov>:
>> > Greetings,
>> >
>> >
>> >
>> > I have a large table (~90 million rows) containing vessel positions. In
>> > addition to a column that contains the location information (the_geom),
>> > the
>> > table also contains two columns that are used to uniquely identify the
>> > vessel (mmsi and name) and a column containing the Unix time (epoch) at
>> > which the position information was logged. I frequently need to assign
>> > records to vessel transits. To do this, I currently create a CTE that
>> > uses a
>> > Window function (partitioning the data by mmsi and name ordered by
>> > epoch) to
>> > examine the time that has elapsed between successive position reports
>> > for
>> > individual vessels. For every position record for a vessel (as
>> > identified
>> > using mmsi and name), if the time elapsed between the current position
>> > record and the previous record (using the lag function) is less than or
>> > equal to 2 hours, I assign the record a value of 0 to a CTE column named
>> > tr_index. If the time elapsed is greater than 2 hours, I assign the
>> > record a
>> > value of 1 to the tr_index column. I then use the CTE to generate
>> > transit
>> > numbers by summing the values in the tr_index field across a Window that
>> > also partitions the data by mmsi and name and is ordered by epoch. This
>> > works, but is very slow (hours). The table is indexed (multi-column
>> > index on
>> > mmsi, name and index on epoch). Does anyone see a way to get what I am
>> > after
>> > in a more efficient manner. What I am after is an assignment of transit
>> > number to vessels' position records based on whether the records were
>> > within
>> > two hours of each other. The SQL that I used is provided below. Any
>> > advice
>> > would be greatly appreciated...
>> >
>> >
>> >
>> > WITH
>> >
>> > cte_01 AS
>> >
>> > (
>> >
>> > SELECT
>> >
>> > a.id,
>> >
>> > a.mmsi,
>> >
>> > a.name,
>> >
>> > a.epoch,
>> >
>> > a.the_geom
>> >
>> > CASE
>> >
>> > WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
>> >
>> > ELSE 0
>> >
>> > END AS tr_index
>> >
>> > FROM table a
>> >
>> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>> >
>> > )
>> >
>> >
>> >
>> >
>> >
>> > SELECT
>> >
>> > a.id,
>> >
>> > a.mmsi,
>> >
>> > a.name,
>> >
>> > a.epoch,
>> >
>> > a.the_geom,
>> >
>> > 1 + sum(a.tr_index) OVER w AS transit,
>> >
>> > a.active
>> >
>> > FROM cte_01 a
>> >
>> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>> >
>> >
>> >
>> > --
>> > Jeff
>
>
>
>
> --
> Jeffrey D. Adams
> Contractor
> OAI, Inc.
> In support of:
> National Marine Fisheries Service
> Office of Protected Resources
> 1315 East West Hwy, Building SSMC3
> Silver Spring, MD 20910-3282
> phone: (301) 427-8434
> fax: (301) 713-0376

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Adams - NOAA Affiliate 2013-03-11 15:48:17 Re: Large Table - Slow Window Functions (Better Approach?)
Previous Message Jeff Adams - NOAA Affiliate 2013-03-11 15:20:07 Re: Large Table - Slow Window Functions (Better Approach?)