From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> |
Subject: | Re: Final Patch for GROUPING SETS |
Date: | 2014-12-21 21:00:05 |
Message-ID: | 20141221210005.GA1864976@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Dec 13, 2014 at 04:37:48AM +0000, Andrew Gierth wrote:
> >>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> >> I'd already explained in more detail way back when we posted the
> >> patch. But to reiterate: the ChainAggregate nodes pass through
> >> their input data unchanged, but on group boundaries they write
> >> aggregated result rows to a tuplestore shared by the whole
> >> chain. The top node returns the data from the tuplestore after its
> >> own output is completed.
>
> Tom> That seems pretty grotty from a performance+memory consumption
> Tom> standpoint. At peak memory usage, each one of the Sort nodes
> Tom> will contain every input row,
>
> Has this objection ever been raised for WindowAgg, which has the same
> issue?
I caution against using window function performance as the template for
GROUPING SETS performance goals. The benefit of GROUPING SETS compared to its
UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance
opportunities. Contrast that having window functions is great even with naive
performance, because they enable tasks that are otherwise too hard in SQL.
> Tom> In principle, with the CTE+UNION approach I was suggesting, the
> Tom> peak memory consumption would be one copy of the input rows in
> Tom> the CTE's tuplestore plus one copy in the active branch's Sort
> Tom> node. I think a bit of effort would be needed to get there (ie,
> Tom> shut down one branch's Sort node before starting the next,
> Tom> something I'm pretty sure doesn't happen today).
>
> Correct, it doesn't.
>
> However, I notice that having ChainAggregate shut down its input would
> also have the effect of bounding the memory usage (to two copies,
> which is as good as the append+sorts+CTE case).
Agreed, and I find that more promising than the CTE approach. Both strategies
require temporary space covering two copies of the input data. (That, or you
accept rescanning the original input.) The chained approach performs less
I/O. Consider "SELECT count(*) FROM t GROUP BY GROUPING SETS (a, b)", where
pg_relation_size(t) >> RAM. I/O consumed with the chained approach:
read table
write tuplesort 1
read tuplesort 1
write tuplesort 2
read tuplesort 2
I/O consumed with the CTE approach:
read table
write CTE
read CTE
write tuplesort 1
read tuplesort 1
read CTE
write tuplesort 2
read tuplesort 2
Tom rightly brought up the space requirements for result rows. The CTE
approach naturally avoids reserving space for that. However, I find it a safe
bet to optimize GROUPING SETS for input >> result. Reserving temporary space
for result rows to save input data I/O is a good trade. We don't actually
need to compromise; one can imagine a GroupAggregateChain plan node with a
sortChain list that exhibits the efficiencies of both. I'm fine moving
forward with the cross-node tuplestore, though.
The elephant in the performance room is the absence of hash aggregation. I
agree with your decision to make that a follow-on patch, but the project would
be in an awkward PR situation if 9.5 has GroupAggregate-only GROUPING SETS. I
may argue to #ifdef-out the feature rather than release that way. We don't
need to debate that prematurely, but keep it in mind while planning.
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Fabrízio de Royes Mello | 2014-12-21 21:30:35 | Re: Proposal "VACUUM SCHEMA" |
Previous Message | Andrew Dunstan | 2014-12-21 19:50:09 | Re: Add min and max execute statement time in pg_stat_statement |