From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Memory leak from ExecutorState context? |
Date: | 2023-04-21 19:18:49 |
Message-ID: | 8338f4ee-47f0-f622-e37f-b1ce1dbfdf38@garret.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21.04.2023 1:51 AM, Melanie Plageman wrote:
> On Thu, Apr 20, 2023 at 12:42 PM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote:
>>> On Sat, 8 Apr 2023 02:01:19 +0200
>>> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
>>>
>>>> On Fri, 31 Mar 2023 14:06:11 +0200
>>>> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
>>>>
>>>> [...]
>>>>
>>>> After rebasing Tomas' memory balancing patch, I did some memory measures
>>>> to answer some of my questions. Please, find in attachment the resulting
>>>> charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption
>>>> between HEAD and Tomas' patch. They shows an alternance of numbers
>>>> before/after calling ExecHashIncreaseNumBatches (see the debug patch). I
>>>> didn't try to find the exact last total peak of memory consumption during the
>>>> join phase and before all the BufFiles are destroyed. So the last number
>>>> might be underestimated.
>>> I did some more analysis about the total memory consumption in filecxt of HEAD,
>>> v3 and v4 patches. My previous debug numbers only prints memory metrics during
>>> batch increments or hash table destruction. That means:
>>>
>>> * for HEAD: we miss the batches consumed during the outer scan
>>> * for v3: adds twice nbatch in spaceUsed, which is a rough estimation
>>> * for v4: batches are tracked in spaceUsed, so they are reflected in spacePeak
>>>
>>> Using a breakpoint in ExecHashJoinSaveTuple to print "filecxt->mem_allocated"
>>> from there, here are the maximum allocated memory for bufFile context for each
>>> branch:
>>>
>>> batches max bufFiles total spaceAllowed rise
>>> HEAD 16384 199966960 ~194MB
>>> v3 4096 65419456 ~78MB
>>> v4(*3) 2048 34273280 48MB nbatch*sizeof(PGAlignedBlock)*3
>>> v4(*4) 1024 17170160 60.6MB nbatch*sizeof(PGAlignedBlock)*4
>>> v4(*5) 2048 34273280 42.5MB nbatch*sizeof(PGAlignedBlock)*5
>>>
>>> It seems account for bufFile in spaceUsed allows a better memory balancing and
>>> management. The precise factor to rise spaceAllowed is yet to be defined. *3 or
>>> *4 looks good, but this is based on a single artificial test case.
>>>
>>> Also, note that HEAD is currently reporting ~4MB of memory usage. This is by
>>> far wrong with the reality. So even if we don't commit the balancing memory
>>> patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix?
>>>
>>> Regards,
>> Thank you for the patch.
>> I faced with the same problem (OOM caused by hash join).
>> I tried to create simplest test reproducing the problem:
>>
>> create table t(pk int, val int);
>> insert into t values (generate_series(1,100000000),0);
>> set work_mem='64kB';
>> explain (analyze,buffers) select count(*) from t t1 join t t2 on
>> (t1.pk=t2.pk);
>>
>>
>> There are three workers and size of each exceeds 1.3Gb.
>>
>> Plan is the following:
>>
>> Finalize Aggregate (cost=355905977972.87..355905977972.88 rows=1
>> width=8) (actual time=2
>> 12961.033..226097.513 rows=1 loops=1)
>> Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374,
>> temp read=944407 w
>> ritten=1130380
>> -> Gather (cost=355905977972.65..355905977972.86 rows=2 width=8)
>> (actual time=212943.
>> 505..226097.497 rows=3 loops=1)
>> Workers Planned: 2
>> Workers Launched: 2
>> Buffers: shared hit=32644 read=852474 dirtied=437947
>> written=426374, temp read=94
>> 4407 written=1130380
>> -> Partial Aggregate (cost=355905976972.65..355905976972.66
>> rows=1 width=8) (ac
>> tual time=212938.410..212940.035 rows=1 loops=3)
>> Buffers: shared hit=32644 read=852474 dirtied=437947
>> written=426374, temp r
>> ead=944407 written=1130380
>> -> Parallel Hash Join (cost=1542739.26..303822614472.65
>> rows=20833345000002 width=0) (actual time=163268.274..207829.524
>> rows=33333333 loops=3)
>> Hash Cond: (t1.pk = t2.pk)
>> Buffers: shared hit=32644 read=852474
>> dirtied=437947 written=426374, temp read=944407 written=1130380
>> -> Parallel Seq Scan on t t1
>> (cost=0.00..859144.78 rows=41666678 width=4) (actual
>> time=0.045..30828.051 rows=33333333 loops=3)
>> Buffers: shared hit=16389 read=426089 written=87
>> -> Parallel Hash (cost=859144.78..859144.78
>> rows=41666678 width=4) (actual time=82202.445..82202.447 rows=33333333
>> loops=3)
>> Buckets: 4096 (originally 4096) Batches:
>> 32768 (originally 8192) Memory Usage: 192kB
>> Buffers: shared hit=16095 read=426383
>> dirtied=437947 written=426287, temp read=267898 written=737164
>> -> Parallel Seq Scan on t t2
>> (cost=0.00..859144.78 rows=41666678 width=4) (actual
>> time=0.054..12647.534 rows=33333333 loops=3)
>> Buffers: shared hit=16095 read=426383
>> dirtied=437947 writ
>> ten=426287
>> Planning:
>> Buffers: shared hit=69 read=38
>> Planning Time: 2.819 ms
>> Execution Time: 226113.292 ms
>> (22 rows)
>>
>>
>>
>> -----------------------------
>>
>> So we have increased number of batches to 32k.
>> I applied your patches 0001-0004 but unfortunately them have not reduced
>> memory consumption - still size of each backend is more than 1.3Gb.
> Is this EXPLAIN ANALYZE run on an instance with Jehan-Guillaume's
> patchset applied or without?
>
> I'm asking because the fourth patch in the series updates spaceUsed with
> the size of the BufFile->buffer, but I notice in your EXPLAIN ANALZYE,
> Memory Usage for the hashtable is reported as 192 kB, which, while
> larger than the 64kB work_mem you set, isn't as large as I might expect.
>
> - Melanie
Yes, this is explain analyze for the Postgres version with applied 4
patches:
0001-v4-Describe-hybrid-hash-join-implementation.patch
0002-v4-Allocate-hash-batches-related-BufFile-in-a-dedicated.patch
0003-v4-Add-some-debug-and-metrics.patch
0004-v4-Limit-BufFile-memory-explosion-with-bad-HashJoin.patch
Just as workaround I tried the attached patch - it prevents backups
memory footprint growth
by limiting number of created batches. I am not sure that it is right
solution, because in any case we allocate more memory than specified by
work_mem. The alternative is to prohibit hash join plan in this case.
But it is also not so good solution, because merge join is used to be
much slower.
Attachment | Content-Type | Size |
---|---|---|
limit_batches.patch | text/x-patch | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2023-04-21 19:25:00 | Re: Order changes in PG16 since ICU introduction |
Previous Message | Sandro Santilli | 2023-04-21 19:17:25 | Re: Order changes in PG16 since ICU introduction |