From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: HashAgg degenerate case |
Date: | 2024-11-12 15:48:31 |
Message-ID: | 3btw7dgj56k5samfzyrxnqdoiocbiycp44osza3iahzn3jqdro@ecxairlbdzqx |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2024-11-08 12:40:39 -0800, Jeff Davis wrote:
> On Fri, 2024-11-08 at 10:48 -0800, Jeff Davis wrote:
> > I can think of two approaches to solve it:
>
> Another thought: the point of spilling is to avoid using additional
> memory by adding a group.
>
> If the bucket array is 89.8% full, adding a new group doesn't require
> new buckets need to be allocated, so we only have the firstTuple and
> pergroup data to worry about. If that causes the memory limit to be
> exceeded, it's the perfect time to switch to spill mode.
> But if the bucket array is 89.9% full, then adding a new group will
> cause the bucket array to double. If that causes the memory limit to be
> exceeded, then we can switch to spill mode, but it's wasteful to do so
> because (a) we won't be using most of those new buckets; (b) the new
> buckets will crowd out space for subsequent batches and even fewer of
> the buckets will be used; and (c) the memory accounting can be off by
> quite a bit.
> What if we have a check where, if the metacxt is using more than 40% of
> the memory, and if adding a new group would reach the grow_threshold,
> then enter spill mode immediately?
That assumes the bucket array is a significant portion of the memory usage -
but that's not a given, right? We also might need to grow to keep the number
of conflict chains at a manageable level.
> To make this work, I think we either need to use a tuplehash_lookup()
> followed by a tuplehash_insert() (two lookups for each new group), or we
> would need a new API into simplehash like tuplehash_insert_without_growing()
> that would return NULL instead of growing.
The former seems entirely non-viable on performance grounds. The latter might
be ok, but I'm vary on code complexity grounds.
What about simply checking whether the bucket array was a significant portion
of the memory usage at spill time and recreating the hashtable instead of
resetting IFF it was?
> This approach might not be backportable, but it might be a good approach for
> 18+.
I doubt any change around this ought to be backpatched. The likelihood of
regressions seems to high.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2024-11-12 19:18:30 | Re: pg_rewind WAL segments deletion pitfall |
Previous Message | Tom Lane | 2024-11-12 15:32:17 | Re: BUG #18700: pg_dump doesn't include definitions for collations nor types (e.g. enumeration) |