From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Spilling hashed SetOps and aggregates to disk |
Date: | 2018-06-05 12:42:30 |
Message-ID: | 20180605124230.etvuxdoulazo6zxo@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-06-04 22:18:56 -0700, Jeff Davis wrote:
> On Mon, 2018-06-04 at 11:52 -0700, Andres Freund wrote:
> > I wonder whether, at least for aggregates, the better fix wouldn't be
> > to
> > switch to feeding the tuples into tuplesort upon memory exhaustion
> > and
> > doing a sort based aggregate. We have most of the infrastructure to
> > do
>
> That's an interesting idea, but it seems simpler to stick to hashing
> rather than using a combination strategy. It also seems like it would
> take less CPU effort.
Isn't the locality of access going to considerably better with the sort
based approach?
> What advantages do you have in mind? My patch partitions the spilled
> data, so it should have similar disk costs as a sort approach.
I think one part of it is that I think the amount of code is going to be
lower - we essentially have already all the code to handle sort based
aggs, and to have both sort and hash based aggs in the same query. We'd
mostly need a way to scan the hashtable and stuff it into a tuplesort,
that's not hard. nodeAgg.c is already more than complex enough, I'm not
sure that full blown partitioning is worth the cost.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-06-05 12:45:38 | Re: Spilling hashed SetOps and aggregates to disk |
Previous Message | Ashutosh Bapat | 2018-06-05 12:23:36 | Re: commitfest 2018-07 |