From: | Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Large Table - Slow Window Functions (Better Approach?) |
Date: | 2013-03-11 15:20:07 |
Message-ID: | CA+BdxK8jW7L5jNHD6kidD3MT2bSDGsTSKOHwmQ_o+xTcb71ORg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Pavel,
Thanks for the response. I have not yet had the opportunity to use cursors,
but am now curious. Could you perhaps provide a bit more detail as to what
the implementation of your suggested approach would look like?
On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:
> Hello
>
> you can try procedural solution - use a cursor over ordered data in
> plpgsql and returns table
>
> Regards
>
> Pavel Stehule
>
> 2013/3/11 Jeff Adams - NOAA Affiliate <jeff(dot)adams(at)noaa(dot)gov>:
> > Greetings,
> >
> >
> >
> > I have a large table (~90 million rows) containing vessel positions. In
> > addition to a column that contains the location information (the_geom),
> the
> > table also contains two columns that are used to uniquely identify the
> > vessel (mmsi and name) and a column containing the Unix time (epoch) at
> > which the position information was logged. I frequently need to assign
> > records to vessel transits. To do this, I currently create a CTE that
> uses a
> > Window function (partitioning the data by mmsi and name ordered by
> epoch) to
> > examine the time that has elapsed between successive position reports for
> > individual vessels. For every position record for a vessel (as identified
> > using mmsi and name), if the time elapsed between the current position
> > record and the previous record (using the lag function) is less than or
> > equal to 2 hours, I assign the record a value of 0 to a CTE column named
> > tr_index. If the time elapsed is greater than 2 hours, I assign the
> record a
> > value of 1 to the tr_index column. I then use the CTE to generate transit
> > numbers by summing the values in the tr_index field across a Window that
> > also partitions the data by mmsi and name and is ordered by epoch. This
> > works, but is very slow (hours). The table is indexed (multi-column
> index on
> > mmsi, name and index on epoch). Does anyone see a way to get what I am
> after
> > in a more efficient manner. What I am after is an assignment of transit
> > number to vessels' position records based on whether the records were
> within
> > two hours of each other. The SQL that I used is provided below. Any
> advice
> > would be greatly appreciated...
> >
> >
> >
> > WITH
> >
> > cte_01 AS
> >
> > (
> >
> > SELECT
> >
> > a.id,
> >
> > a.mmsi,
> >
> > a.name,
> >
> > a.epoch,
> >
> > a.the_geom
> >
> > CASE
> >
> > WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
> >
> > ELSE 0
> >
> > END AS tr_index
> >
> > FROM table a
> >
> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >
> > )
> >
> >
> >
> >
> >
> > SELECT
> >
> > a.id,
> >
> > a.mmsi,
> >
> > a.name,
> >
> > a.epoch,
> >
> > a.the_geom,
> >
> > 1 + sum(a.tr_index) OVER w AS transit,
> >
> > a.active
> >
> > FROM cte_01 a
> >
> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >
> >
> >
> > --
> > Jeff
>
--
Jeffrey D. Adams
Contractor
OAI, Inc.
In support of:
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2013-03-11 15:34:17 | Re: Large Table - Slow Window Functions (Better Approach?) |
Previous Message | Pavel Stehule | 2013-03-11 15:03:38 | Re: Large Table - Slow Window Functions (Better Approach?) |