From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com> |
Subject: | Re: Combining Aggregates |
Date: | 2016-03-20 03:48:44 |
Message-ID: | CAKJS1f8AtOiUaQb1ipz--K5G1aXH4+GUEsttAYS9GpKhk2_9tA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17 March 2016 at 14:25, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 03/16/2016 12:03 PM, David Rowley wrote:
>> Patch 2:
>> This adds the serial/deserial aggregate infrastructure, pg_dump
>> support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
>> serialise and deserialise aggregate states when instructed to do so.
>>
>> Patch 3:
>> This adds a boat load of serial/deserial functions, and combine
>> functions for most of the built-in numerical aggregate functions. It
>> also contains some regression tests which should really be in patch 2,
>> but I with patch 2 there's no suitable serialisation or
>> de-serialisation functions to test CREATE AGGREGATE with. I think
>> having them here is ok, as patch 2 is quite useless without patch 3
>> anyway.
>
> I don't see how you could move the tests into #2 when the functions are
> defined in #3? IMHO this is the right place for the regression tests.
Yeah, but the CREATE AGGREGATE changes which are being tested in 0003
were actually added in 0002. I think 0002 is the right place to test
the changes to CREATE AGGREGATE, but since there's a complete lack of
functions to use, then I've just delayed until 0003.
>> Another thing to note about this patch is that I've gone and created
>> serial/de-serial functions for when PolyNumAggState both require
>> sumX2, and don't require sumX2. I had thought about perhaps putting an
>> extra byte in the serial format to indicate if a sumX2 is included,
>> but I ended up not doing it this way. I don't really want these serial
>> formats getting too complex as we might like to do fun things like
>> pass them along to sharded servers one day, so it might be nice to
>> keep them simple.
>
>
> Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if it's
> worth the additional complexity at this point. I mean, one byte is hardly
> going to make a measurable difference - we're probably wasting more than
> that due to alignment, for example.
I don't think any alignment gets done here. Look at pq_sendint().
Did you mean the complexity of having extra functions, or having the
extra byte to check in the deserial func?
> As you've mentioned sharded servers - how stable should the serialized
> format be? I've assumed it to be transient, i.e. in the extreme case it
> might change after restarting a database - for the parallel aggregate that's
> clearly sufficient.
>
> But if we expect this to eventually work in a sharded environment, that's
> going to be way more complicated. I guess in these cases we could rely on
> implicit format versioning, via the major the version (doubt we'd tweak the
> format in a minor version anyway).
>
> I'm not sure the sharding is particularly useful argument at this point,
> considering we don't really know if the current format is actually a good
> match for that.
Perhaps you're right. At this stage I've no idea if we'd want to
support shards on varying major versions. I think probably not, since
the node write functions might not be compatible with the node read
functions on the other server.
I've attached another series of patches:
0001: This is the latest Parallel Aggregate Patch, not intended for
review here, but is required for the remaining patches. This patch has
changed quite a bit from the previous one that I posted here, and the
remaining patches needed re-based due to those changes.
0002: Adds serial/de-serial function support to CREATE AGGREGATE,
contains minor fix-ups from last version.
0003: Adds various combine/serial/de-serial functions for the standard
set of aggregates, apart from most float8 aggregates.
0004: Adds regression tests for 0003 pg_aggregate.h changes.
0005: Documents to mention which aggregate functions support partial mode.
0006: Re-based version of Haribabu's floating point aggregate support,
containing some fixes by me.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-aggregation-to-happen-in-parallel_2016-03-20.patch | application/octet-stream | 50.9 KB |
0002-Allow-INTERNAL-state-aggregates-to-participate-in-pa_2016-03-20.patch | application/octet-stream | 96.8 KB |
0003-Add-various-aggregate-combine-and-serialize-de-seria_2016-03-20.patch | application/octet-stream | 60.9 KB |
0004-Add-sanity-regression-tests-for-new-aggregate-serial_2016-03-20.patch | application/octet-stream | 4.7 KB |
0005-Add-documents-to-explain-which-aggregates-support-pa_2016-03-20.patch | application/octet-stream | 16.7 KB |
0006-Add-combine-functions-for-various-floating-point-agg_2016-03-20.patch | application/octet-stream | 27.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-03-20 04:02:23 | Re: VS 2015 support in src/tools/msvc |
Previous Message | David Rowley | 2016-03-20 03:23:03 | Re: Combining Aggregates |