From: | Kip Warner <kip(at)thevertigo(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Kevin Grittner <kgrittn(at)gmail(dot)com>, "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org> |
Subject: | Re: Tricky math in PSQL |
Date: | 2016-04-27 07:09:44 |
Message-ID: | 1461740984.28754.14.camel@thevertigo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
On Tue, 2016-04-26 at 09:13 -0500, Merlin Moncure wrote:
> I'm not going to work out the whole problem for you, but you probably
> want to start looking in the direction custom aggregate function.
> Most people are not aware of this but custom aggregates can be
> applied over a window. The state can be a composite type and thus
> hold arbitrarily complex information that you can maintain as you
> walk through the set (this works great for time series and things
> like that).
>
> Barring that, if you are still looking for solutions to this type of
> problem in the database, I'd point you in the direction of pl/r. The
> learning curve is somewhat steep but rewards mastery.
>
> CREATE AGGREGATE RateChange(NUMERIC, NUMERIC, INT)
> (
> SFUNC = AggRateChange,
> STYPE = NUMERIC
> );
>
> CREATE OR REPLACE FUNCTION AggRateChange(
> State NUMERIC,
> Seed NUMERIC,
> RateChange NUMERIC,
> NDigits INT) RETURNS NUMERIC AS
> $$
> SELECT ROUND(COALESCE($1, $2) * (1.0 + $3), NDigits);
> $$ LANGUAGE SQL IMMUTABLE;
>
> select ratechange(100, (random() * 0.05)::numeric, 3) over(order by
> s)
> from generate_series(1,10) s;
> ratechange
> ────────────
> 101.524
> 104.570
> 104.674
> 106.599
> 111.685
> 114.583
> 120.174
> 125.561
> 130.291
Hey Merlin,
Thank you for your thoughtful suggestions. I spent the better part of
the day wrestling with some refactored schema and queries with a great
deal of help from the kind folks in #postgresql. This is what I've got
now, albeit without custom aggregate functions:
As you can see, it does appear to work, but it's a problem of doing so
efficiently now. The solution I have around line 192 to 208 performs
total_diff() for a given id against all others, but with some constants
removed. It does work, but for a statistic table of 1,000,000 rows and
a statistics_child table of 101,948,612, the query takes approximately
4m 11s on my machine.
I'd like to improve this. I wonder if providing indexes on some of the
columns in the two views would be helpful?
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
From | Date | Subject | |
---|---|---|---|
Next Message | Hassan Schroeder | 2016-04-27 15:56:19 | Logging connections, not queries |
Previous Message | Merlin Moncure | 2016-04-26 14:13:44 | Re: Tricky math in PSQL |