Re: Spilling hashed SetOps and aggregates to disk

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-11 15:55:02
Message-ID: CA+TgmoZaCQVo9z7AF3vOuz-+pk1-hPTFaee-hD8EDMVpLfOJoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 11, 2018 at 11:29 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I think the underlying question is whether we want to treat this as an
> emergency safety against OOM (which should be a rare thing in most cases) or
> something allowing us to pick hash aggregate more often.
>
> It would be great to get something that performs better than just falling
> back to sort (and I was advocating for that), but I'm worried we might be
> moving the goalposts way too far.

Well, the scope of the first patch is always negotiable, but I think
we should try to think beyond the first patch in design discussions,
so that we don't paint ourselves into a corner. I also think it's
worth noting that an emergency safety which also causes a 10x
performance degradation is not necessarily much better than OOM. I
suspect that part of the reason why both Andres and David proposed
algorithms that combined hashing and sorting is out of a desire to
cater somewhat to both few-tuples-per-group and many-tuples-per-group.
AFAICS, Andres's algorithm will be better for few-tuples-per-group
(maybe with a few large groups mixed in) and David's algorithm will be
better for many-tuples-per-group (maybe with some small groups mixed
in), but in each case the inclusion of a sorting component seems like
it will ease the pain in the opposite use case. However, I wonder
whether it will be better still to just acknowledge that those two
cases really need separate algorithms and then think about the best
approach for each one individually.

One big advantage to just partitioning the input tuples by hash code
and then proceeding batch by batch is that it works for any aggregate
that can support hash aggregation in the first place. You do not need
a btree opclass. You do not need support for serialize/deserialize or
combine. If you have serialize/deserialize, it's better, because now
you can spill growing transition values out of the hash table on the
fly. But the basic contours of the algorithm don't change even if
such support is lacking. That's a significant advantage in my view,
because with some of these more complex strategies, we'll end up with
code paths for data types lacking the necessary support which are
almost never exercised in practice. That's likely to lead to bugs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-06-11 16:26:11 Re: cursors with prepared statements
Previous Message Alvaro Herrera 2018-06-11 15:51:05 Re: CF bug fix items