Re: Reduce useless changes before reassembly during logical replication

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: li jie <ggysxcq(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Reduce useless changes before reassembly during logical replication
Date: 2024-01-18 06:42:45
Message-ID: CALj2ACXYGQ2nYF0LOYi5u7xBqH+MEb8eWsTEvJrx_QrC+107Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 11:45 AM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> Hi hackers,
>
> During logical replication, if there is a large write transaction, some
> spill files will be written to disk, depending on the setting of
> logical_decoding_work_mem.
>
> This behavior can effectively avoid OOM, but if the transaction
> generates a lot of change before commit, a large number of files may
> fill the disk. For example, you can update a TB-level table.
>
> However, I found an inelegant phenomenon. If the modified large table is not
> published, its changes will also be written with a large number of spill files.
> Look at an example below:

Thanks. I agree that decoding and queuing the changes of unpublished
tables' data into reorder buffer is an unnecessary task for walsender.
It takes processing efforts (CPU overhead), consumes disk space and
uses memory configured via logical_decoding_work_mem for a replication
connection inefficiently.

> Later you will see a large number of spill files in the
>
> We can see that table tbl_t1 is not published in mypub. It also won't be sent
> downstream because it's not subscribed.
> After the transaction is reorganized, the pgoutput decoding plugin filters out
> changes to these unpublished relationships when sending logical changes.
> See function pgoutput_change.

Right. Here's my testing [1].

> Most importantly, if we filter out unpublished relationship-related
> changes after constructing the changes but before queuing the changes
> into a transaction, will it reduce the workload of logical decoding
> and avoid disk
> or memory growth as much as possible?

Right. It can.

> The patch in the attachment is a prototype, which can effectively reduce the
> memory and disk space usage during logical replication.
>
> Design:
> 1. Added a callback LogicalDecodeFilterByRelCB for the output plugin.
>
> 2. Added this callback function pgoutput_table_filter for the pgoutput plugin.
> Its main implementation is based on the table filter in the
> pgoutput_change function.
> Its main function is to determine whether the change needs to be published based
> on the parameters of the publication, and if not, filter it.
>
> 3. After constructing a change and before Queue a change into a transaction,
> use RelidByRelfilenumber to obtain the relation associated with the change,
> just like obtaining the relation in the ReorderBufferProcessTXN function.
>
> 4. Relation may be a toast, and there is no good way to get its real
> table relation based on toast relation. Here, I get the real table oid
> through toast relname, and then get the real table relation.
>
> 5. This filtering takes into account INSERT/UPDATE/INSERT. Other
> changes have not been considered yet and can be expanded in the future.

Design of this patch is based on the principle of logical decoding
filtering things out early on and looks very similar to
filter_prepare_cb_wrapper/pg_decode_filter_prepare and
filter_by_origin_cb/pgoutput_origin_filter. Per my understanding this
design looks okay unless I'm missing anything.

> Test:
> 1. Added a test case 034_table_filter.pl
> 2. Like the case above, create two tables, the published table tbl_pub and
> the non-published table tbl_t1
> 3. Insert 10,000 rows of toast data into tbl_t1 on the publisher, and use
> pg_ls_replslotdir to record the total size of the slot directory
> every second.
> 4. Compare the size of the slot directory at the beginning of the
> transaction(size1),
> the size at the end of the transaction (size2), and the average
> size of the entire process(size3).
> 5. Assert(size1==size2==size3)

I bet that the above test with 10K rows is going to take a noticeable
time on some buildfarm members (it took 6 seconds on my dev system
which is an AWS EC2 instance). And, the above test can get flaky.
Therefore, IMO, the concrete way of testing this feature is by looking
at the server logs for the following message using
PostgreSQL::Test::Cluster log_contains().

+filter_done:
+
+ if (result && RelationIsValid(relation))
+ elog(DEBUG1, "logical filter change by table %s",
RelationGetRelationName(relation));
+

Here are some comments on the v1 patch:
1.
@@ -1415,9 +1419,6 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
TupleTableSlot *old_slot = NULL;
TupleTableSlot *new_slot = NULL;

- if (!is_publishable_relation(relation))
- return;
-

Instead of removing is_publishable_relation from pgoutput_change, I
think it can just be turned into an assertion
Assert(is_publishable_relation(relation));, no?

2.
+ switch (change->action)
+ {
+ /* intentionally fall through */

Perhaps, it must use /* FALLTHROUGH */ just like elsewhere in the
code, otherwise a warning is thrown.

3. From commit message:
Most of the code in the FilterByTable function is transplanted from
the ReorderBufferProcessTXN
function, which can be called before the ReorderBufferQueueChange function.It is

I think the above note can just be above the FilterByTable function
for better understanding.

+static bool
+FilterByTable(LogicalDecodingContext *ctx, ReorderBufferChange *change)
+{

4. Why is FilterByTable(ctx, change) call placed after DecodeXLogTuple
in DecodeInsert, DecodeUpdate and DecodeDelete? Is there a use for
decoded tuples done by DecodeXLogTuple in the new callback
filter_by_table_cb? If not, can we move FilterByTable call before
DecodeXLogTuple to avoid some more extra processing?

5. Why is ReorderBufferChange needed as a parameter to FilterByTable
and filter_by_table_cb? Can't just the LogicalDecodingContext and
relation name, the change action be enough to decide if the table is
publishable or not? If done this way, it can avoid some more
processing, no?

6. Please run pgindent and pgperltidy on the new source code and new
TAP test file respectively.

[1]
HEAD:
postgres=# BEGIN;
BEGIN
Time: 0.110 ms
postgres=*# insert into tbl_t1 select i,repeat('xyzzy',
i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,99999)
i;
INSERT 0 100000
Time: 379488.265 ms (06:19.488)
postgres=*#

ubuntu:~/postgres/pg17/bin$ du -sh
/home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub
837M /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub
ubuntu:~/postgres/pg17/bin$ du -sh /home/ubuntu/postgres/pg17/bin/db17
2.6G /home/ubuntu/postgres/pg17/bin/db17

PATCHED:
postgres=# BEGIN;
BEGIN
Time: 0.105 ms
postgres=*# insert into tbl_t1 select i,repeat('xyzzy',
i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,99999)
i;
INSERT 0 100000
Time: 380044.554 ms (06:20.045)

ubuntu:~/postgres$ du -sh /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub
8.0K /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub
ubuntu:~/postgres$ du -sh /home/ubuntu/postgres/pg17/bin/db17
1.8G /home/ubuntu/postgres/pg17/bin/db17

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-01-18 06:53:57 Re: Catalog domain not-null constraints
Previous Message Arne Roland 2024-01-18 06:39:09 Re: A performance issue with Memoize