Quick Links

Re: Improve eviction algorithm in ReorderBuffer

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Improve eviction algorithm in ReorderBuffer
Date:	2023-12-15 03:36:50
Message-ID:	CAA4eK1LpCosf6RW+g0+UcEJtGaP=OH1zRveibpWfSzAHBx1VMg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 13, 2023 at 6:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Dec 12, 2023 at 1:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 12, 2023 at 9:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've heard the report internally that replication lag became huge when
> > > decoding transactions each consisting of 500k sub transactions. Note
> > > that ReorderBufferLargetstTXN() is used only in non-streaming mode.
> > >

Can't you suggest them to use streaming mode to avoid this problem or
do you see some problem with that?

> > > Here is a test script for a many subtransactions scenario. In my
> > > environment, the logical decoding took over 2min to decode one top
> > > transaction having 100k subtransctions.
> > >
> > > -----
> > > create table test (c int);
> > > create or replace function testfn (cnt int) returns void as $$
> > > begin
> > > for i in 1..cnt loop
> > > begin
> > > insert into test values (i);
> > > exception when division_by_zero then
> > > raise notice 'caught error';
> > > return;
> > > end;
> > > end loop;
> > > end;
> > > $$
> > > language plpgsql;
> > > select testfn(100000)
> > > set logical_decoding_work_mem to '4MB';
> > > select count(*) from pg_logical_slot_peek_changes('s', null, null)
> > > ----
> > >
> > > To deal with this problem, I initially thought of the idea (a)
> > > mentioned in the comment; use a binary heap to maintain the
> > > transactions sorted by the amount of changes or the size. But it seems
> > > not a good idea to try maintaining all transactions by its size since
> > > the size of each transaction could be changed frequently.
> > >
> > > The attached patch uses a different approach that consists of three
> > > strategies; (1) maintain the list of transactions whose size is larger
> > > than 10% of logical_decoding_work_mem, and preferentially evict a
> > > transaction from this list.
> > >

IIUC, you are giving preference to multiple list ideas as compared to
(a) because you don't need to adjust the list each time the
transaction size changes, is that right? If so, I think there is a
cost to keep that data structure up-to-date but it can help in
reducing the number of times we need to serialize.

If the list is empty, all transactions are
> > > small enough, (2) so we evict the oldest top-level transaction from
> > > rb->toplevel_by_lsn list. Evicting older transactions would help in
> > > freeing memory blocks in GenerationContext. Finally, if this is also
> > > empty, (3) we evict a transaction that size is > 0. Here, we need to
> > > note the fact that even if a transaction is evicted the
> > > ReorderBufferTXN entry is not removed from rb->by_txn but its size is
> > > 0. In the worst case where all (quite a few) transactions are smaller
> > > than 10% of the memory limit, we might end up checking many
> > > transactions to find non-zero size transaction entries to evict. So
> > > the patch adds a new list to maintain all transactions that have at
> > > least one change in memory.
> > >
> > > Summarizing the algorithm I've implemented in the patch,
> > >
> > > 1. pick a transaction from the list of large transactions (larger than
> > > 10% of memory limit).
> > > 2. pick a transaction from the top-level transaction list in LSN order.
> > > 3. pick a transaction from the list of transactions that have at least
> > > one change in memory.
> > >
> > > With the patch, the above test case completed within 3 seconds in my
> > > environment.
> >
> > Thanks for working on this, I think it would be good to test other
> > scenarios as well where this might have some negative impact and see
> > where we stand.
>
> Agreed.
>
> > 1) A scenario where suppose you have one very large transaction that
> > is consuming ~40% of the memory and 5-6 comparatively smaller
> > transactions that are just above 10% of the memory limit. And now for
> > coming under the memory limit instead of getting 1 large transaction
> > evicted out, we are evicting out multiple times.
>
> Given the large transaction list will have up to 10 transactions, I
> think it's cheap to pick the largest transaction among them. It's O(N)
> but N won't be large.
>
> > 2) Another scenario where all the transactions are under 10% of the
> > memory limit but let's say there are some transactions are consuming
> > around 8-9% of the memory limit each but those are not very old
> > transactions whereas there are certain old transactions which are
> > fairly small and consuming under 1% of memory limit and there are many
> > such transactions. So how it would affect if we frequently select
> > many of these transactions to come under memory limit instead of
> > selecting a couple of large transactions which are consuming 8-9%?
>
> Yeah, probably we can do something for small transactions (i.e. small
> and on-memory transactions). One idea is to pick the largest
> transaction among them by iterating over all of them. Given that the
> more transactions are evicted, the less transactions the on-memory
> transaction list has, unlike the current algorithm, we would still
> win. Or we could even split it into several sub-lists in order to
> reduce the number of transactions to check. For example, splitting it
> into two lists: transactions consuming 5% < and 5% >= of the memory
> limit, and checking the 5% >= list preferably.
>

Which memory limit are you referring to here? Is it logical_decoding_work_mem?

> The cost for
> maintaining these lists could increase, though.
>

Yeah, can't we maintain a single list of all xacts that are consuming
equal to or greater than the memory limit? Considering that the memory
limit is logical_decoding_work_mem, then I think just picking one
transaction to serialize would be sufficient.

--
With Regards,
Amit Kapila.

In response to

Re: Improve eviction algorithm in ReorderBuffer at 2023-12-13 00:30:44 from Masahiko Sawada

Responses

Re: Improve eviction algorithm in ReorderBuffer at 2023-12-15 05:59:16 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2023-12-15 03:48:17	Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message	Junwang Zhao	2023-12-15 03:27:30	Re: Make COPY format extendable: Extract COPY TO format implementations