dynahash vs. memory-bounded HashAggregate (hashjoin-style)

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: dynahash vs. memory-bounded HashAggregate (hashjoin-style)
Date: 2014-08-19 11:44:46
Message-ID: 1b2b61ed050899c569bafbaef0a8b821.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

while working on a prototype of memory-bounded hash aggregate (alternative
to Jeff's patch discussed in [1]), I ran into difficulties when dealing
with dynahash. So I'm asking for help ...

Some of the difficulties stem from my limited knowledge of dynahash, and
complexity in execGrouping.c, but it seems to me some are actually due to
a mismatch with the proposed hashjoin-like approach to batching.

The hashjoin-like batching essentially does this:

1) put data (in this case aggregate states) into a hash table, until a
memory limit is reached
2) double the number of batches and move half the entries from the hash
table (based on batch number, computed from the hash value)
3) repeat

This however requires two things, that I think seem quite difficult to do
with dynahash:

(a) detecting that a memory limit was reached

This is usually done with a condition like this:

if (consumed_memory > work_mem) {
... do the batching / remove ~50% of data from hash table
... it's expected counsumed_memory drops by 50%
}

Where consumed memory is the size of a memory context (cheap to
compute, thanks to another patch from Jeff [2]).

This however does not work with dynahash, because dynahash does not
release memory for removed tuples - it just moves it to a freelist, so
consumed_memory only grows.

For a while I thought I could do this:

if (consumed_memory > consumed_memory_prev) {
...
consumed_memory_prev = consumed_memory
}

But then I found out dynahash does not grow continuously, but in (quite
large) steps. Exceeding the limit a bit is not a big deal, but the
growth is quite fast and quickly leads to allocating much more than the
limit.

(b) removing tuples while batching

Whenever the number of batches is increased (doubled), I need to walk
through the hash table and remove entries not belonging to the current
batch (should be ~50% of them). The only way to do this with dynahash
sems to be iterating over the entries, and then doing another search
with HASH_REMOVE. Is there a better way?

I've been considering switching this to a custom hash table (similar to
what's used in hashjoin
https://commitfest.postgresql.org/action/patch_view?id=1494) which seems
like a better solution for this use-case, but I'm not sure about this. I'm
not a big fan of replacing large amounts of code for no good reason.

Opinions?

I'd be grateful if someone more knowledgeable of dynahash / the way it's
used in execGrouping could review the prototype I have so far. There's
only a handful of functions related to dynahash, and most of the issues I
have is about passing the values properly (slots, dummy values, tuples).

regards
Toms

[1]
http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop

[2]
http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop

Browse pgsql-hackers by date

  From Date Subject
Next Message Vivek Singh Raghuwanshi 2014-08-19 11:46:48 Re: [Postgres-xc-developers] Trove with PostgreSQL-XC
Previous Message 鈴木 幸市 2014-08-19 11:34:11 Re: [Postgres-xc-developers] Trove with PostgreSQL-XC