Re: Adjusting hash join memory limit to handle batch explosion

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adjusting hash join memory limit to handle batch explosion
Date: 2025-01-09 20:42:54
Message-ID: CAAKRu_bYZvc3gHa+PnE2BTwbu1xX3MgyG+99-MZh7Rg5QyOEEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> This means that ultimately it's either (1) or (3), and the more I've
> been looking into this the more I prefer (1), for a couple reasons:
>
> * It's much simpler (it doesn't really change anything on the basic
> behavior, doesn't introduce any new files or anything like that.
>
> * There doesn't seem to be major difference in total memory consumption
> between the two approaches. Both allow allocating more memory.
>
> * It actually helps with the "indivisible batch" case - it relaxes the
> limit, so there's a chance the batch eventually fits and we stop adding
> more and more batches. With spill files that's not the case - we still
> keep the original limit, and we end up with the batch explosion (but
> then we handle it much more efficiently).

Okay, I've read all the patches proposed in this mail and most of the
downthread ideas, and I want to cast my vote for option 1.
I find the design of 3 too complicated for what it buys us.

The slices make it harder to understand how the system works. The
current 1-1 relationship in master between batches and spill files is
easier to reason about. With the slices, I'm imagining trying to
understand why we, for example, have to move tuples from batch 4 just
because batch 5 got too big for the hashtable.

I think something like this might be worth it if it solved the problem
entirely, but if it is just a somewhat better coping mechanism, I
don't think it is worth it.

I was excited about your raw file experiment. As Robert and you point
out -- we may need a file per batch, but for most of the hash join's
execution we don't need to keep buffers for each batch around.
However, given that the experiment didn't yield great results and we
haven't come up with an alternative solution with sufficiently few
flaws, I'm still in favor of 1.

The part of 1 I struggled to understand is the math in
ExecHashExceededMemoryLimit(). I think the other email you sent with
the charts and diagonals is about choosing the optimal hashtable size
and number of batches (when to stop growing the number of batches and
just increase the size of the hashtable). So, I'll dive into that.

> One thing I'm not sure about yet is whether this needs to tweak the
> hashjoin costing to also consider the files when deciding how many
> batches to use. Maybe it should?

I think it definitely should. The ExecChooseHashTableSize()
calculations look similar to what we use to calculate spaceAllowed, so
it makes sense that we would consider buffile sizes if we are counting
those in spaceUsed now.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-01-09 20:45:55 Re: Psql meta-command conninfo+
Previous Message Daniel Gustafsson 2025-01-09 20:40:04 Re: [PoC] Federated Authn/z with OAUTHBEARER