Re: Slow catchup of 2PC (twophase) transactions on replica in LR

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date: 2024-02-23 04:52:11
Message-ID: CAFPTHDaf9sc3VZWZKr3-xf2jv+gF6q3ywipofc-+5GHyCJRSCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий <v(dot)davydov(at)postgrespro(dot)ru>
wrote:

> Dear All,
>
> I'd like to present and talk about a problem when 2PC transactions are
> applied quite slowly on a replica during logical replication. There is a
> master and a replica with established logical replication from the master
> to the replica with twophase = true. With some load level on the master,
> the replica starts to lag behind the master, and the lag will be
> increasing. We have to significantly decrease the load on the master to
> allow replica to complete the catchup. Such problem may create significant
> difficulties in the production. The problem appears at least on
> REL_16_STABLE branch.
>
> To reproduce the problem:
>
> - Setup logical replication from master to replica with subscription
> parameter twophase = true.
> - Create some intermediate load on the master (use pgbench with custom
> sql with prepare+commit)
> - Optionally switch off the replica for some time (keep load on
> master).
> - Switch on the replica and wait until it reaches the master.
>
> The replica will never reach the master with even some low load on the
> master. If to remove the load, the replica will reach the master for much
> greater time, than expected. I tried the same for regular transactions, but
> such problem doesn't appear even with a decent load.
>
>
>
I tried this setup and I do see that the logical subscriber does reach the
master in a short time. I'm not sure what I'm missing. I stopped the
logical subscriber in between while pgbench was running and then started it
again and ran the following:
postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication;
sent_lsn | pg_current_wal_lsn
-----------+--------------------
0/6793FA0 | 0/6793FA0 <=== caught up
(1 row)

My pgbench command:
pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql -T 200 -P 5

my custom sql file:
cat test.sql
SELECT md5(random()::text) as mygid \gset
BEGIN;
DELETE FROM test WHERE v = pg_backend_pid();
INSERT INTO test(v) SELECT pg_backend_pid();
PREPARE TRANSACTION $$:mygid$$;
COMMIT PREPARED $$:mygid$$;

regards,
Ajin Cherian
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-02-23 04:52:32 Re: RFC: Logging plan of the running query
Previous Message Peter Smith 2024-02-23 04:46:38 Re: About a recently-added message