From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improve eviction algorithm in ReorderBuffer |
Date: | 2023-12-15 05:59:16 |
Message-ID: | CAD21AoDZqKvLG0D7sE1D5MSZu+hGmPHPbq_SB6Vr0gQ-zzo-nQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 15, 2023 at 12:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 13, 2023 at 6:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 12, 2023 at 1:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Dec 12, 2023 at 9:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I've heard the report internally that replication lag became huge when
> > > > decoding transactions each consisting of 500k sub transactions. Note
> > > > that ReorderBufferLargetstTXN() is used only in non-streaming mode.
> > > >
>
> Can't you suggest them to use streaming mode to avoid this problem or
> do you see some problem with that?
Yeah, that's one option. But I can suggest
>
> > > > Here is a test script for a many subtransactions scenario. In my
> > > > environment, the logical decoding took over 2min to decode one top
> > > > transaction having 100k subtransctions.
> > > >
> > > > -----
> > > > create table test (c int);
> > > > create or replace function testfn (cnt int) returns void as $$
> > > > begin
> > > > for i in 1..cnt loop
> > > > begin
> > > > insert into test values (i);
> > > > exception when division_by_zero then
> > > > raise notice 'caught error';
> > > > return;
> > > > end;
> > > > end loop;
> > > > end;
> > > > $$
> > > > language plpgsql;
> > > > select testfn(100000)
> > > > set logical_decoding_work_mem to '4MB';
> > > > select count(*) from pg_logical_slot_peek_changes('s', null, null)
> > > > ----
> > > >
> > > > To deal with this problem, I initially thought of the idea (a)
> > > > mentioned in the comment; use a binary heap to maintain the
> > > > transactions sorted by the amount of changes or the size. But it seems
> > > > not a good idea to try maintaining all transactions by its size since
> > > > the size of each transaction could be changed frequently.
> > > >
> > > > The attached patch uses a different approach that consists of three
> > > > strategies; (1) maintain the list of transactions whose size is larger
> > > > than 10% of logical_decoding_work_mem, and preferentially evict a
> > > > transaction from this list.
> > > >
>
> IIUC, you are giving preference to multiple list ideas as compared to
> (a) because you don't need to adjust the list each time the
> transaction size changes, is that right?
Right.
> If so, I think there is a
> cost to keep that data structure up-to-date but it can help in
> reducing the number of times we need to serialize.
Yes, there is a trade-off.
What I don't want to do is to keep all transactions ordered since it's
too costly. The proposed idea uses multiple lists to keep all
transactions roughly ordered. The maintenance cost would be cheap
since each list is unordered.
It might be a good idea to have a threshold to switch how to pick the
largest transaction based on the number of transactions in the
reorderbuffer. If there are many transactions, we can use the proposed
algorithm to find a possibly-largest transaction, otherwise use the
current way.
>
> If the list is empty, all transactions are
> > > > small enough, (2) so we evict the oldest top-level transaction from
> > > > rb->toplevel_by_lsn list. Evicting older transactions would help in
> > > > freeing memory blocks in GenerationContext. Finally, if this is also
> > > > empty, (3) we evict a transaction that size is > 0. Here, we need to
> > > > note the fact that even if a transaction is evicted the
> > > > ReorderBufferTXN entry is not removed from rb->by_txn but its size is
> > > > 0. In the worst case where all (quite a few) transactions are smaller
> > > > than 10% of the memory limit, we might end up checking many
> > > > transactions to find non-zero size transaction entries to evict. So
> > > > the patch adds a new list to maintain all transactions that have at
> > > > least one change in memory.
> > > >
> > > > Summarizing the algorithm I've implemented in the patch,
> > > >
> > > > 1. pick a transaction from the list of large transactions (larger than
> > > > 10% of memory limit).
> > > > 2. pick a transaction from the top-level transaction list in LSN order.
> > > > 3. pick a transaction from the list of transactions that have at least
> > > > one change in memory.
> > > >
> > > > With the patch, the above test case completed within 3 seconds in my
> > > > environment.
> > >
> > > Thanks for working on this, I think it would be good to test other
> > > scenarios as well where this might have some negative impact and see
> > > where we stand.
> >
> > Agreed.
> >
> > > 1) A scenario where suppose you have one very large transaction that
> > > is consuming ~40% of the memory and 5-6 comparatively smaller
> > > transactions that are just above 10% of the memory limit. And now for
> > > coming under the memory limit instead of getting 1 large transaction
> > > evicted out, we are evicting out multiple times.
> >
> > Given the large transaction list will have up to 10 transactions, I
> > think it's cheap to pick the largest transaction among them. It's O(N)
> > but N won't be large.
> >
> > > 2) Another scenario where all the transactions are under 10% of the
> > > memory limit but let's say there are some transactions are consuming
> > > around 8-9% of the memory limit each but those are not very old
> > > transactions whereas there are certain old transactions which are
> > > fairly small and consuming under 1% of memory limit and there are many
> > > such transactions. So how it would affect if we frequently select
> > > many of these transactions to come under memory limit instead of
> > > selecting a couple of large transactions which are consuming 8-9%?
> >
> > Yeah, probably we can do something for small transactions (i.e. small
> > and on-memory transactions). One idea is to pick the largest
> > transaction among them by iterating over all of them. Given that the
> > more transactions are evicted, the less transactions the on-memory
> > transaction list has, unlike the current algorithm, we would still
> > win. Or we could even split it into several sub-lists in order to
> > reduce the number of transactions to check. For example, splitting it
> > into two lists: transactions consuming 5% < and 5% >= of the memory
> > limit, and checking the 5% >= list preferably.
> >
>
> Which memory limit are you referring to here? Is it logical_decoding_work_mem?
logical_decoding_work_mem.
>
> > The cost for
> > maintaining these lists could increase, though.
> >
>
> Yeah, can't we maintain a single list of all xacts that are consuming
> equal to or greater than the memory limit? Considering that the memory
> limit is logical_decoding_work_mem, then I think just picking one
> transaction to serialize would be sufficient.
IIUC we serialize a transaction when the sum of all transactions'
memory usage in the reorderbuffer exceeds logical_decoding_work_mem.
In what cases are multiple transactions consuming equal to or greater
than the logical_decoding_work_mem?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Junwang Zhao | 2023-12-15 06:02:49 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Peter Smith | 2023-12-15 05:43:00 | Re: "pgoutput" options missing on documentation |