From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Rod Taylor <pg(at)rbt(dot)ca>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Large Scale Aggregation (HashAgg Enhancement) |
Date: | 2006-01-16 19:19:06 |
Message-ID: | 1137439146.3180.171.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> >> A couple of days ago I found myself wanting to aggregate 3 Billion
> >> tuples down to 100 Million tuples based on an integer key with six
> >> integer values -- six sum()'s.
>
> > There is already hash table overflow (spill to disk) logic in HashJoins,
> > so this should be possible by reusing that code for HashAggs. That's on
> > my todo list, but I'd welcome any assistance.
>
> Yeah, I proposed something similar awhile back in conjunction with
> fixing the spill logic for hash joins (which was always there, but was
> not adaptive until recently). I got the join part done but got
> distracted before fixing HashAgg :-(
You've done the main work. :-)
> The tricky part is to preserve the existing guarantee that tuples are
> merged into their aggregate in arrival order. (This does not matter for
> the standard aggregates but it definitely does for custom aggregates,
> and there will be unhappy villagers appearing on our doorsteps if we
> break it.) I think this can work correctly under the above sketch but
> it needs to be verified. It might require different handling of the
> TODO files than what hashjoin does.
For HJ we write each outer tuple to its own file-per-batch in the order
they arrive. Reading them back in preserves the original ordering. So
yes, caution required, but I see no difficulty, just reworking the HJ
code (nodeHashjoin and nodeHash). What else do you see?
Best Regards, Simon Riggs
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2006-01-16 19:20:30 | Re: PostgreSQL win32 & NT4 |
Previous Message | Simon Riggs | 2006-01-16 19:02:56 | Re: Large Scale Aggregation (HashAgg Enhancement) |