From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improve eviction algorithm in ReorderBuffer |
Date: | 2023-12-13 00:30:44 |
Message-ID: | CAD21AoC_Jq1WjhhzTWU-vUdNErgHy9o6_fOuuFKPO2n8M2eE1Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 12, 2023 at 1:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Dec 12, 2023 at 9:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi all,
> >
> > As the comment of ReorderBufferLargestTXN() says, it's very slow with
> > many subtransactions:
> >
> > /*
> > * Find the largest transaction (toplevel or subxact) to evict (spill to disk).
> > *
> > * XXX With many subtransactions this might be quite slow, because we'll have
> > * to walk through all of them. There are some options how we could improve
> > * that: (a) maintain some secondary structure with transactions sorted by
> > * amount of changes, (b) not looking for the entirely largest transaction,
> > * but e.g. for transaction using at least some fraction of the memory limit,
> > * and (c) evicting multiple transactions at once, e.g. to free a given portion
> > * of the memory limit (e.g. 50%).
> > */
> >
> > This is because the reorderbuffer has transaction entries for each
> > top-level and sub transaction, and ReorderBufferLargestTXN() walks
> > through all transaction entries to pick the transaction to evict.
> > I've heard the report internally that replication lag became huge when
> > decoding transactions each consisting of 500k sub transactions. Note
> > that ReorderBufferLargetstTXN() is used only in non-streaming mode.
> >
> > Here is a test script for a many subtransactions scenario. In my
> > environment, the logical decoding took over 2min to decode one top
> > transaction having 100k subtransctions.
> >
> > -----
> > create table test (c int);
> > create or replace function testfn (cnt int) returns void as $$
> > begin
> > for i in 1..cnt loop
> > begin
> > insert into test values (i);
> > exception when division_by_zero then
> > raise notice 'caught error';
> > return;
> > end;
> > end loop;
> > end;
> > $$
> > language plpgsql;
> > select testfn(100000)
> > set logical_decoding_work_mem to '4MB';
> > select count(*) from pg_logical_slot_peek_changes('s', null, null)
> > ----
> >
> > To deal with this problem, I initially thought of the idea (a)
> > mentioned in the comment; use a binary heap to maintain the
> > transactions sorted by the amount of changes or the size. But it seems
> > not a good idea to try maintaining all transactions by its size since
> > the size of each transaction could be changed frequently.
> >
> > The attached patch uses a different approach that consists of three
> > strategies; (1) maintain the list of transactions whose size is larger
> > than 10% of logical_decoding_work_mem, and preferentially evict a
> > transaction from this list. If the list is empty, all transactions are
> > small enough, (2) so we evict the oldest top-level transaction from
> > rb->toplevel_by_lsn list. Evicting older transactions would help in
> > freeing memory blocks in GenerationContext. Finally, if this is also
> > empty, (3) we evict a transaction that size is > 0. Here, we need to
> > note the fact that even if a transaction is evicted the
> > ReorderBufferTXN entry is not removed from rb->by_txn but its size is
> > 0. In the worst case where all (quite a few) transactions are smaller
> > than 10% of the memory limit, we might end up checking many
> > transactions to find non-zero size transaction entries to evict. So
> > the patch adds a new list to maintain all transactions that have at
> > least one change in memory.
> >
> > Summarizing the algorithm I've implemented in the patch,
> >
> > 1. pick a transaction from the list of large transactions (larger than
> > 10% of memory limit).
> > 2. pick a transaction from the top-level transaction list in LSN order.
> > 3. pick a transaction from the list of transactions that have at least
> > one change in memory.
> >
> > With the patch, the above test case completed within 3 seconds in my
> > environment.
>
> Thanks for working on this, I think it would be good to test other
> scenarios as well where this might have some negative impact and see
> where we stand.
Agreed.
> 1) A scenario where suppose you have one very large transaction that
> is consuming ~40% of the memory and 5-6 comparatively smaller
> transactions that are just above 10% of the memory limit. And now for
> coming under the memory limit instead of getting 1 large transaction
> evicted out, we are evicting out multiple times.
Given the large transaction list will have up to 10 transactions, I
think it's cheap to pick the largest transaction among them. It's O(N)
but N won't be large.
> 2) Another scenario where all the transactions are under 10% of the
> memory limit but let's say there are some transactions are consuming
> around 8-9% of the memory limit each but those are not very old
> transactions whereas there are certain old transactions which are
> fairly small and consuming under 1% of memory limit and there are many
> such transactions. So how it would affect if we frequently select
> many of these transactions to come under memory limit instead of
> selecting a couple of large transactions which are consuming 8-9%?
Yeah, probably we can do something for small transactions (i.e. small
and on-memory transactions). One idea is to pick the largest
transaction among them by iterating over all of them. Given that the
more transactions are evicted, the less transactions the on-memory
transaction list has, unlike the current algorithm, we would still
win. Or we could even split it into several sub-lists in order to
reduce the number of transactions to check. For example, splitting it
into two lists: transactions consuming 5% < and 5% >= of the memory
limit, and checking the 5% >= list preferably. The cost for
maintaining these lists could increase, though.
Do you have any ideas?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2023-12-13 02:55:02 | Re: Oversight in reparameterize_path_by_child leading to executor crash |
Previous Message | Tom Lane | 2023-12-12 23:02:26 | Re: Clean up find_typedefs and add support for Mac |