From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Florian Pflug <fgp(at)phlo(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date: | 2014-01-22 09:54:44 |
Message-ID: | CAApHDvr_oSpvM-XXz43eCMX8n0EfshJ=9j+rxvGqCy91YR-YQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jan 19, 2014 at 3:22 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jan18, 2014, at 06:15 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:>
> Note that after this fix the results for my quick benchmark now look like:
> >
> > create table num (num numeric(10,2) not null);
> > insert into num (num) select * from generate_series(1,20000);
> > select sum(num) over(order by num rows between current row and unbounded
> following) from num; -- 113 ms
> > select sum(num / 10) over(order by num rows between current row and
> unbounded following) from num; -- 156ms
> > select sum(num / 1) over(order by num rows between current row and
> unbounded following) from num; -- 144 ms
> >
> > So it seems to be much less prone to falling back to brute force forward
> > transitions.
> > It also seems the / 10 version must have had to previously do 1 brute
> > force rescan but now it looks like it can do it in 1 scan.
> >
>
I've performed some more benchmarks on this patch tonight. The results and
full recreation scripts are attached along with the patch it was tested
against.
The benchmarks contain a bit of a mix of different possible workloads, I've
tried to show the worst and best case for everything I can think of that
has a best and worst case. The best case with the patched version is
naturally very good when the frame is unbounded following and the frame
head moves down on each row, that is providing the inverse transition can
actually take place.
I've noticed a slight performance regression on the worse case for max()
and min() aggregates. Note that these inverse transition functions are
implemented to return failure when the value being removed is equal to the
current state. e.g, if max() is 10 and we remove 9, then we can report
success, but if max is 10 and we remove 10 we report failure. The failed
attempt to remove in this case will be costing us a little bit more than it
was previously.
I'm not quite sure how likely it is to be a realistic workload, but a query
such as:
select max(id) over (order by id DESC rows between current row and
unbounded following) from test_numeric;
will fail to perform inverse transitions on *EVERY* row.
For int types this regression is only about 2-3%, but it will likely
increase as the comparison function gets more expensive. I wondered if it
would be worth attempting to check the window's order by and compare that
to the aggregate's sort operator and not perform inverse transitions in
this case. Although a query such as this:
select first_value(id) over (order by id DESC rows between current row and
unbounded following) from test_numeric;
would give the same results to the user and will likely be much much faster
anyway, so perhaps detecting that is not worth it.
The performance regression seems a bit unavoidable in the case where the
order by col is not the same as the max's column, but the 2 columns happen
to sort the results in the same way. I can't really think of a way to
detect this. But then I think you'll get back what you lost even if you
could perform just 1 inverse transition in the entire processing of the
window, so that case is probably quite unlikely to hit and if you compare
it to the possible cases that it would improve.
I focused quite a bit on testing the performance of SUM(numeric) due to a
few extra lines of code that I had to add into do_numeric_accum() to allow
tracking of the maximum dscale'd numeric. My plain aggregate test on this
showed no measurable regression, though there will be a couple of extra CPU
cycles in there on each call to the function.
I didn't really bother doing much performance testing on aggregates like
count(*) and sum(int) as these can always perform inverse transitions, the
best cases for each of the other queries should be used as a guide for how
much performance will increase.
I'm pretty happy to see that for processing as little as 20 thousand rows
that the patch can increase performance by up to 1000 times! And this
increase will just get bigger with more rows.
Here's some sample results from the attached txt file:
select sum(num_fixed_scale) over(order by id rows between current row and
unbounded following) from test_numeric;
Results in miliseconds
Patched:
50.958
63.820
64.719
52.860
64.921
Unpatched:
61358.856
61180.370
62642.495
61416.385
62383.012
Comp. Patched Unpatched
average 59.4556 61796.2236 Avg Increase (times) 1039.367589 103936.76%
median 63.82 61416.385 Median Increase (times) 962.3375901 96233.76%
If anyone can think of any other workloads that they would like tested
please let me know.
Regards
David Rowley.
Attachment | Content-Type | Size |
---|---|---|
benchmarks.txt | text/plain | 7.6 KB |
inverse_transition_functions_5b0f391_2014-01-22.patch.gz | application/x-gzip | 36.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Marti Raudsepp | 2014-01-22 10:22:02 | Re: Proposal: variant of regclass |
Previous Message | KONDO Mitsumasa | 2014-01-22 08:32:56 | Re: Add min and max execute statement time in pg_stat_statement |