Re: Spilling hashed SetOps and aggregates to disk

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-13 19:53:10
Message-ID: 1528919590.8818.93.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2018-06-13 at 11:50 -0300, Claudio Freire wrote:
> In essence, the technique I've been using always uses a fixed-size
> hash table to do partial grouping. The table is never grown, when
> collisions do happen, the existing entry in the hash table is flushed
> to disk and the aggregate state in that bucket reset for the incoming
> key. To avoid excessive spilling due to frequent collisions, we use a
> kind of "lazy cuckoo" hash table. Lazy in the sense that it does no
> relocations, it just checks 2 hash values, and if it has to evict, it
> evicts the "oldest" value (with some cheap definition of "old").

...

> An adaptive hash agg node would start as a hash agg, and turn into a
> "partial hash agg + sort + final group agg" when OOM is detected.
>
> The benefit over ordinary sort+group agg is that the sort is
> happening
> on a potentially much smaller data set. When the buffer is large
> enough to capture enough key repetitions, the output of the partial
> hash agg can be orders of magnitude smaller than the scan.
>
> My proposal was to use this for all group aggs, not only when the
> planner chooses a hash agg, because it can speed up the sort and
> reduce temp storage considerably. I guess the trick lies in
> estimating
> that cardinality reduction to avoid applying this when there's no
> hope
> of getting a payoff. The overhead of such a lazy hash table isn't
> much, really. But yes, its cache locality is terrible, so it needs to
> be considered.

I think this is a promising approach because it means the planner has
one less decion to make. And the planner doesn't have great information
to make its decision on, anyway (ndistinct estimates are hard enough on
plain tables, and all bets are off a few levels up in the plan).

Unfortunately, it means that either all data types must support hashing
and sorting[1], or we need to have a fallback path that can get by with
hashing alone (which would not be tested nearly as well as more typical
paths).

I still like this idea, though.

Regards,
Jeff Davis

[1] https://www.postgresql.org/message-id/9584.1528739975%40sss.pgh.pa.
us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-06-13 19:59:24 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Previous Message Fabien COELHO 2018-06-13 19:44:52 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors