Re: Tricky math in PSQL

From: Kip Warner <kip(at)thevertigo(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Tricky math in PSQL
Date: 2016-04-27 07:09:44
Message-ID: 1461740984.28754.14.camel@thevertigo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Tue, 2016-04-26 at 09:13 -0500, Merlin Moncure wrote:
> I'm not going to work out the whole problem for you, but you probably
> want to start looking in the direction custom aggregate function.
> Most people are not aware of this but custom aggregates can be
> applied over a window. The state can be a composite type and thus
> hold arbitrarily complex information that you can maintain as you
> walk through the set (this works great for time series and things
> like that).
>
> Barring that, if you are still looking for solutions to this type of
> problem in the database, I'd point you in the direction of pl/r. The
> learning curve is somewhat steep but rewards mastery.
>
> CREATE AGGREGATE RateChange(NUMERIC, NUMERIC, INT)
> (
> SFUNC = AggRateChange,
> STYPE = NUMERIC
> );
>
> CREATE OR REPLACE FUNCTION AggRateChange(
> State NUMERIC,
> Seed NUMERIC,
> RateChange NUMERIC,
> NDigits INT) RETURNS NUMERIC AS
> $$
> SELECT ROUND(COALESCE($1, $2) * (1.0 + $3), NDigits);
> $$ LANGUAGE SQL IMMUTABLE;
>
> select ratechange(100, (random() * 0.05)::numeric, 3) over(order by
> s)
> from generate_series(1,10) s;
> ratechange
> ────────────
> 101.524
> 104.570
> 104.674
> 106.599
> 111.685
> 114.583
> 120.174
> 125.561
> 130.291

Hey Merlin,

Thank you for your thoughtful suggestions. I spent the better part of
the day wrestling with some refactored schema and queries with a great
deal of help from the kind folks in #postgresql. This is what I've got
now, albeit without custom aggregate functions:

http://pastebin.com/zwWmYMd0

As you can see, it does appear to work, but it's a problem of doing so
efficiently now. The solution I have around line 192 to 208 performs
total_diff() for a given id against all others, but with some constants
removed. It does work, but for a statistic table of 1,000,000 rows and
a statistics_child table of 101,948,612, the query takes approximately
4m 11s on my machine.

I'd like to improve this. I wonder if providing indexes on some of the
columns in the two views would be helpful?

--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Hassan Schroeder 2016-04-27 15:56:19 Logging connections, not queries
Previous Message Merlin Moncure 2016-04-26 14:13:44 Re: Tricky math in PSQL