From: | "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Alex Richman <alexrichman(at)onesignal(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Niels Stevens <niels(dot)stevens(at)onesignal(dot)com> |
Subject: | RE: Logical Replica ReorderBuffer Size Accounting Issues |
Date: | 2023-05-23 04:11:29 |
Message-ID: | OSZPR01MB6278C3FCBCE47A42CCF05DE99E409@OSZPR01MB6278.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, May 9, 2023 at 22:58 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, May 9, 2023 at 6:06 PM Wei Wang (Fujitsu)
> > > I think there are two separate issues. One is a pure memory accounting
> > > issue: since the reorderbuffer accounts the memory usage by
> > > calculating actual tuple size etc. it includes neither the chunk
> > > header size nor fragmentations within blocks. So I can understand why
> > > the output of MemoryContextStats(rb->context) could be two or three
> > > times higher than logical_decoding_work_mem and doesn't match rb->size
> > > in some cases.
> > >
> > > However it cannot explain the original issue that the memory usage
> > > (reported by MemoryContextStats(rb->context)) reached 5GB in spite of
> > > logilca_decoding_work_mem being 256MB, which seems like a memory leak
> > > bug or something we ignore the memory limit.
> >
> > Yes, I agree that the chunk header size or fragmentations within blocks may
> > cause the allocated space to be larger than the accounted space. However, since
> > these spaces are very small (please refer to [1] and [2]), I also don't think
> > this is the cause of the original issue in this thread.
> >
> > I think that the cause of the original issue in this thread is the
> > implementation of 'Generational allocator'.
> > Please consider the following user scenario:
> > The parallel execution of different transactions led to very fragmented and
> > mixed-up WAL records for those transactions. Later, when walsender serially
> > decodes the WAL, different transactions' chunks were stored on a single block
> > in rb->tup_context. However, when a transaction ends, the chunks related to
> > this transaction on the block will be marked as free instead of being actually
> > released. The block will only be released when all chunks in the block are
> > free. In other words, the block will only be released when all transactions
> > occupying the block have ended. As a result, the chunks allocated by some
> > ending transactions are not being released on many blocks for a long time.
> Then
> > this issue occurred. I think this also explains why parallel execution is more
> > likely to trigger this issue compared to serial execution of transactions.
> > Please also refer to the analysis details of code in [3].
>
> After some investigation, I don't think the implementation of
> generation allocator is problematic but I agree that your scenario is
> likely to explain the original issue. Especially, the output of
> MemoryContextStats() shows:
>
> Tuples: 4311744512 total in 514 blocks (12858943 chunks);
> 6771224 free (12855411 chunks); 4304973288 used
>
> First, since the total memory allocation was 4311744512 bytes in 514
> blocks we can see there were no special blocks in the context (8MB *
> 514 = 4311744512 bytes). Second, it shows that the most chunks were
> free (12858943 chunks vs. 12855411 chunks) but most memory were used
> (4311744512 bytes vs. 4304973288 bytes), which means that there were
> some in-use chunks at the tail of each block, i.e. the most blocks
> were fragmented. I've attached another test to reproduce this
> behavior. In this test, the memory usage reaches up to almost 4GB.
>
> One idea to deal with this issue is to choose the block sizes
> carefully while measuring the performance as the comment shows:
>
> /*
> * XXX the allocation sizes used below pre-date generation context's block
> * growing code. These values should likely be benchmarked and set to
> * more suitable values.
> */
> buffer->tup_context = GenerationContextCreate(new_ctx,
> "Tuples",
> SLAB_LARGE_BLOCK_SIZE,
> SLAB_LARGE_BLOCK_SIZE,
> SLAB_LARGE_BLOCK_SIZE);
>
> For example, if I use SLAB_DEFAULT_BLOCK_SIZE, 8kB, the maximum memory
> usage was about 17MB in the test.
Thanks for your idea.
I did some tests as you suggested. I think the modification mentioned above can
work around this issue in the test 002_rb_memory_2.pl on [1] (To reach the size
of large transactions, I set logical_decoding_work_mem to 1MB). But the test
repreduce.sh on [2] still reproduces this issue. It seems that this modification
will fix a subset of use cases, But the issue still occurs for other use cases.
I think that the size of a block may lead to differences in the number of
transactions stored on the block. For example, before the modification, a block
could store some changes of 10 transactions, but after the modification, a block
may only store some changes of 3 transactions. I think this means that once
these three transactions are committed, this block will be actually released.
As a result, the probability of the block being actually released is increased
after the modification. Additionally, I think that the parallelism of the test
repreduce.sh is higher than that of the test 002_rb_memory_2.pl, which is also
the reason why this modification only fixed the issue in the test
002_rb_memory_2.pl.
Please let me know if I'm missing something.
Attach the modification patch that I used (tmp-modification.patch), as well as
the two tests mentioned above.
[1] - https://www.postgresql.org/message-id/CAD21AoAa17DCruz4MuJ_5Q_-JOp5FmZGPLDa%3DM9d%2BQzzg8kiBw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/OS3PR01MB6275A7E5323601D59D18DB979EC29%40OS3PR01MB6275.jpnprd01.prod.outlook.com
Regards,
Wang wei
Attachment | Content-Type | Size |
---|---|---|
tmp-modification.patch | application/octet-stream | 1.6 KB |
002_rb_memory_2.pl | application/octet-stream | 1.1 KB |
repreduce.sh | application/octet-stream | 8.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2023-05-23 06:47:19 | BUG #17939: CREATE EXTENSION pltcl; looks in the wrong folder |
Previous Message | Michael Paquier | 2023-05-22 22:43:11 | Re: BUG #17938: could not open shared memory segment "/PostgreSQL.615216676": No such file or directory |