From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Adjusting hash join memory limit to handle batch explosion |
Date: | 2025-01-09 17:01:55 |
Message-ID: | 36af890e-464d-41ae-88a8-d06317f36a98@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1/9/25 17:17, Melanie Plageman wrote:
> On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> So I decided to revisit the three patches from 2019. Attached are
>> rebased and cleaned up versions. A couple comments on each one:
>>
>>
>> 1) v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch
>>
>> I believe this is the way to go, for now. The basic idea is to keep the
>> overall behavior, but "relax" the memory limit as the number of batches
>> increases to minimize the total memory use.
>>
>> This may seem a bit weird, but as the number of batches grows there's no
>> way to not violate the limit. And the current code simply ignores this
>> and allocates arbitrary amounts of memory.
>
> I'm just catching up on this thread and haven't read all the mails
> yet. I started with looking at the patches in the first email and got
> a bit confused.
>
> In this patch (v20241231-adjust-limit-0001-Account-for-batch-files-in-ha.patch),
> I see that you've started accounting for the spill files in
> hashtable->spaceUsed -- in the same way that is done for the tuples in
> the hashtable. I know the other memory contexts (hashCxt and batchCxt)
> in hashtable aren't appropriate for figuring out spaceUsed, but I was
> wondering if the hashtable->spillCxt accurately reflects how much
> memory is being used for these spill files at one time? Perhaps it
> doesn't make sense to use this, but when we added it in 8c4040edf45, I
> thought we might one day be able to use it for determining peak space
> usage. Or perhaps you are imagining it only be used for observability?
>
Good question. Yes, the patch from 12/31 does look at all the memory,
including the batch files. I thought about using the spillCtx too, but I
don't think it it would work because the context tracks *current* memory
usage, and we're interested in how much memory would be used *after*
doubling the number of batches.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2025-01-09 17:08:39 | Re: why there is not VACUUM FULL CONCURRENTLY? |
Previous Message | Sami Imseih | 2025-01-09 16:56:36 | Re: Psql meta-command conninfo+ |