From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | David Gershuni <dgershun(at)cs(dot)cmu(dot)edu> |
Cc: | Claudio Freire <klaussfreire(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Spilling hashed SetOps and aggregates to disk |
Date: | 2018-06-21 23:57:05 |
Message-ID: | 1529625425.3820.81.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2018-06-21 at 13:44 -0700, David Gershuni wrote:
> To handle hash collisions, we can do the following:
>
> 1) We track the current hash code we’re processing, in ascending
> order.
>
> 2) Instead of combining one group at at time, we’ll maintain a list
> of
> all groups we’ve seen that match the current hash code.
>
> 3) When we peak at a spill file, if its next group’s hash code
> matches
> our current hash code, we’ll check to see if that group matches any
> of the
> groups in our list. If so, we’ll combine it with the matching group.
> If not,
> we’ll add this group to our list.
>
> 4) Once the head of each spill file exceeds the current hash code,
> we’ll
> emit our list as output, empty it, and advance to the next hash code.
>
> Does that seem reasonable?
It seems algorithmically reasonable but awfully complex. I don't think
that's a good path to take.
I still only see two viable approaches:
(1) Spill tuples into hash partitions: works and is a reasonably simple
extension of our existing code. This is basically what the last patch I
posted does (posted before grouping sets, so needs to be updated).
(2) Spill tuples into a sort: I like this approach because it can be
done simply (potentially less code than #1), and could be further
improved without adding a ton of complexity. It may even eliminate the
need for the planner to choose between hashagg and sort. The problem
here is data types that have a hash opclass but no btree opclass. I
might try to sidestep this problem by saying that data types with no
btree opclass will not obey work_mem.
Additionally, there are two other ideas that we might want to do
regardless of which approach we take:
* support evicting transition states from the hash table, so that we
can handle clustered input better
* refactor grouping sets so that phases are separate executor nodes so
that we can undo some of the complexity in nodeAgg.c
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2018-06-22 00:24:45 | Re: PATCH: backtraces for error messages |
Previous Message | Bruce Momjian | 2018-06-21 23:46:35 | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |