HashAgg degenerate case

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: HashAgg degenerate case
Date: 2024-11-06 00:59:56
Message-ID: a050b6067334cf513bf34f9c7bc925d651160f3a.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

HashAgg tracks the hash table bucket array (only TupleHashEntryData)
memory differently than the overall hash table (including the
firstTuple) memory.

It can run into a degenerate case when the bucket array grows larger
than the allotted memory for the overall hash table. That, by itself,
shouldn't be a major problem because it would just spill any new groups
and then start over in the next batch.

However, ResetTupleHashTable() calls tuplehash_reset(), and that
preserves the bucket array (just clearing it). That means that the next
time around it's already over the limit, and degenerates to holding a
single tuple.

Unfortunately I don't have a small reproducible case yet. I believe it
would need to be a case where the bucket array is more than half of the
memory limit, and then the hash table needs to grow again (doubling).
The actual case involves parallelism, so maybe that throws off the
estimates in an interesting way.

Fixing it seems fairly easy though: we just need to completely destroy
the hash table each time and recreate it. Something close to the
attached patch (rough).

We might also want to make the degenerate case less painful -- should
we raise the minimum number of groups per batch? One possible problem
would be around groups where the state is large, like ARRAY_AGG(), but
we could have different limits.

Regards,
Jeff Davis

Attachment Content-Type Size
v1-0001-HashAgg-completely-rebuild-hash-tables-each-itera.patch text/x-patch 1.3 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2024-11-06 02:50:50 Re: HashAgg degenerate case
Previous Message Christoph Berg 2024-11-05 21:08:27 Re: BUG #18655: APT repository shell script doesn't work