Re: proposal: window function - change_number

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mart Kelder <mart(at)kelder31(dot)nl>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: window function - change_number
Date: 2014-09-21 15:20:28
Message-ID: CAFj8pRDoN253ZzGbuvS+Hpb8-n-NZR=X4M2v+vsF4oQOi4dzMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-09-21 17:00 GMT+02:00 Mart Kelder <mart(at)kelder31(dot)nl>:

> Hi Pavel (and others),
>
> Op zondag 21 september 2014 15:35:46 schreef u:
> > 2014-09-21 14:30 GMT+02:00 Mart Kelder <mart(at)kelder31(dot)nl>:
> > > Hi Pavel (and others),
> > >
> > > Pavel Stehule wrote:
> > > > Hi
> > > > I tried to solve following task:
> > > >
> > > > I have a table
> > > >
> > > > start, reason, km
> > > > =============
> > > >
> > > > 2014-01-01 08:00:00, private, 10
> > > > 2014-01-01 09:00:00, commerc, 20
> > > > 2014-01-01 10:00:00, commerc, 20
> > > > 2014-01-01 11:00:00, private, 8
> > > >
> > > > and I would reduce these rows to
> > > >
> > > > 2014-01-01 08:00:00, private, 10
> > > > 2014-01-01 09:00:00, commerc, 20 + 20 = 40
> > > > 2014-01-01 11:00:00, private, 8
> > > >
> > > > It is relative hard to it now with SQL only. But we can simplify this
> > > > task
> > > >
> > > > with window function that returns number of change in some column.
> Then
> > > > this task can be solved by
> > > >
> > > > select min(start), min(reason), sum(km)
> > > > from (select start, reason, km, change_number(reason) over (order
> by
> > > > start))
> > > >
> > > > group by change_number;
> > >
> > > What about
> > >
> > > select srk.reason, min(srk.start), sum(srk.km)
> > > from start_reason_km srk
> > > group by srk.reason, (select max(start) from start_reason_km other
> WHERE
> > > other.start < srk.start and other.reason != srk.reason);
> >
> > This query is Cartesian product, so for some large data it is
> significantly
> > slower then window function (required only sorts without joins)
>
> I think we have the same queryplan in mind (with only one scan). As far as
> I
> know, SQL is a language where you define the result you want to get, and
> let
> the server find a way how to find the data. I think windowed function also
> say
> how the server needs to get the information.
>

What I know It is not true now. Any subselect enforce individual scan of
source relation. Postgres has no any special support for selfjoin.

>
> The real challenge is of course of finding heuristics to remove the
> additional
> join. In this particular case, I can tell how to remove the inner join from
> the subquery:
> * the where-clause of the self-join contains other.start < srk.start. From
> that we can conclude that if the table is (or can be) sorted on "start", we
> have seen the data before the subquery is executed
> * because we only need an aggregate, we need to store the intermediate
> "max"
> for each "reason". And then add the result to the stream.
>
> Recently, I had a simular problem with a table containing a timestamp, a
> state
> and a object where the state belongs to. A object remains in a state until
> there is a more recent tuple in the table. I needed basically to query all
> the
> previous state for each tuple, but preverably without the additional join.
>
> > My motivation was a) to implement described task without Cartesian
> product.
>
Good reason (if you consider the queryplan and not the query).
>
>
yes.

There is probably big space for improvements in more directions. For
example - we have application, where is often used pattern SELECT FROM A
JOIN (SELECT someagg() FROM A) .. ON

Sometimes these queries are slow due terrible low estimation. It is one
example of more

Pavel

> > b) introduce some tool for this kind of problems. I seen more times a
> > request .. reduce a time series, and a window function "change_number"
> (or
> > maybe "consistent_series_number") can be good candidate.
>
> I also need to note that there is a lot of difference in complexity
> between the
> possible solutions of this problem. Where a new window function can
> probably
> be very easy implemented, the optimizer changes descripted above are
> complex
> and not easy to implement.
>
> I also want to note that I am not really against new window functions, I
> only
> want to point out that a more generic solution also might be possible.
>
> Regards,
>
> Mart
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2014-09-21 15:51:34 Re: proposal: window function - change_number
Previous Message Mart Kelder 2014-09-21 15:00:27 Re: proposal: window function - change_number