Re: Improving avg performance for numeric

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving avg performance for numeric
Date: 2013-11-17 00:29:16
Message-ID: 9796.1384648156@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> [ numeric-optimize-v8.patch ]

I've committed this with some adjustments. Aside from cosmetic
changes and documentation/comment improvements:

* I don't agree at all with the claim that aggtransspace is only useful
for INTERNAL transition data. There are lots of cases with regular
SQL types where the planner doesn't have a good idea of how large the
transition value will be, and with the current code it tends to guess
on the small side (frequently 32 bytes :-(). It seems to me to be
useful to give users a knob to twiddle there. Consider for example
an aggregate that uses an integer array as transition state; the author
of the aggregate might know that there are usually hundreds of entries
in the array, and wish to dial up aggtransspace to prevent the planner
from optimistically choosing hash aggregation.

As committed, I just took out the restriction in CREATE AGGREGATE
altogether; it will let you set SSPACE for any transition datatype.
The planner will ignore it for pass-by-value types, where the behavior
is known, but otherwise honor it. It's possible that we should put in
a restriction to INTERNAL plus pass-by-reference types, or even INTERNAL
plus pass-by-reference variable-length types, but I can't get excited
about that. The error message would be too confusing I think.

* There was a stray "else" added to clauses.c that disabled space
accounting for polymorphic aggregates.

* I simplified the summing code in do_numeric_accum; the copying and
reallocating it was doing was not only unnecessary but contained
unpleasant assumptions about what you can do with a NumericVar. This
probably makes the committed patch a bit faster than what was submitted,
but I didn't try to benchmark the change.

* I made sure all the functions accessed their input state pointer with
state = PG_ARGISNULL(0) ? NULL : (NumericAggState *) PG_GETARG_POINTER(0);
There were places that omitted the ARGISNULL test, and thus only happened
to work if the uninitialized Datum value they got was zero. While that
might chance to be true in the current state of the code, it's not an
assumption you're supposed to make.

* numeric sum/avg failed to check for NaN inputs.

* I fixed incorrect tests in the code added to pg_dump.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-11-17 00:36:35 Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Previous Message Josh Berkus 2013-11-17 00:13:19 Re: additional json functionality