Re: restarting logical replication after upgrading standby

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Axel Rau <Axel(dot)Rau(at)chaos1(dot)de>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: restarting logical replication after upgrading standby
Date: 2022-02-06 19:03:04
Message-ID: CAM+6J96j4qErattg-DiqRKFE80GFTn6ag7-Q2A7DKQX1UWUOGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Sun, 6 Feb 2022 at 17:18, Axel Rau <Axel(dot)Rau(at)chaos1(dot)de> wrote:

> Thanks for your answer, Vijay,
>
> Perhaps I should mention, that logical replication was already running
> prior to upgrade.
> I just did a shutdown of the server at the subscriber (being upgraded).
> I did nothing at the publisher.
> The slot just went to inactive.
>
>
I just simulated an upgrade of subscriber (mine was 13 to 14 though).
after the upgrade, subscription just stops and is disabled.
looks like it gets disabled in the upgrade.
so i enable it first.
even though subscription is active, the relation which was in subscription
was removed from pg_subscription_rel,
so ofcourse it was not rcving updates from the publisher.
below is the initial walkthrough.

--------- post upgrade of subscriber
postgres=# select * from pg_stat_subscription;
subid | subname | pid | relid | received_lsn | last_msg_send_time |
last_msg_receipt_time | latest_end_lsn | latest_end_time
-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------
16403 | mysub | | | | |
| |
(1 row)

postgres=# select * from pg_subscription;
oid | subdbid | subname | subowner | subenabled | subbinary | substream
| subconninfo | subslotname | subsynccommit | subpublications
-------+---------+---------+----------+------------+-----------+-----------+-------------+-------------+---------------+-----------------
16403 | 16401 | mysub | 10 | f | f | f
| port=8001 | mysub | off | {mypub}
(1 row)

postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+----------
(0 rows)

*# so i enable subscription ( but still relation not subscribing to changes
)*
postgres=# alter subscription mysub enable;
ALTER SUBSCRIPTION
postgres=# select * from pg_subscription;
oid | subdbid | subname | subowner | subenabled | subbinary | substream
| subconninfo | subslotname | subsynccommit | subpublications
-------+---------+---------+----------+------------+-----------+-----------+-------------+-------------+---------------+-----------------
16403 | 16401 | mysub | 10 | t | f | f
| port=8001 | mysub | off | {mypub}
(1 row)

postgres=# select * from pg_stat_subscription;
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_subscription;
-[ RECORD 1 ]---------+---------------------------------
subid | 16403
subname | mysub
pid | 1726
relid |
received_lsn | 0/15E37F0
last_msg_send_time | 2022-02-07 00:11:31.623328+05:30
last_msg_receipt_time | 2022-02-07 00:11:31.623353+05:30
latest_end_lsn | 0/15E37F0
latest_end_time | 2022-02-07 00:11:31.623328+05:30

*#but still getting no changes as relation not in subscription *
postgres=# select * from pg_subscription_rel;
(0 rows)

postgres=# \dt
List of relations
-[ RECORD 1 ]----
Schema | public
Name | t
Type | table
Owner | postgres

*# so i try refresh publication, but that is like a re -sync, given the
table already has data, it errors due to pkey conflict and aborts.*
postgres=# alter subscription mysub refresh publication ;
ALTER SUBSCRIPTION
postgres=# select * from pg_subscription_rel;
-[ RECORD 1 ]-----
srsubid | 16403
srrelid | 16384
srsubstate | d *---- this is COPY*
srsublsn |

*# so finally, i truncate the table and re-sync :( and everything works,
but i think i am missing something here. i am pretty sure we *
*# can play with pg_replication_origin_advance
or pg_replication_slot_advance to move the lsn to continue subscription
without a sync, i'll have to spend some time to understand that*
*# but for now, in summary, subscription breaks in upgrade. After an
upgrade, we need to enable it back and possibly refresh publication
(re-sync) from scratch.*
*# but i am hoping i can be corrected by more experienced people here.*

postgres=# truncate table t;
TRUNCATE TABLE
postgres=# alter subscription mysub refresh publication ;
ALTER SUBSCRIPTION
postgres=# table t;
id
----
1
2
3
(3 rows)

#table re-enabled and is *ready*
postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+-----------
16403 | 16384 | r | 0/15E3E28
(1 row)

> Should I have deleted the slot prior to upgrading?
>
> Axel
>
> Am 05.02.22 um 23:41 schrieb Vijaykumar Jain:
> > On Sun, 6 Feb 2022 at 02:20, Axel Rau <Axel(dot)Rau(at)chaos1(dot)de> wrote:
> >
> >> Am 03.02.22 um 17:06 schrieb Axel Rau:
> >>> Replication to another standby is working, but the slot of the upgraded
> >>> standby is inactive.
> >> Also the subscription has subenabled set to false.
> >> Enabling it, turns on the active bit at the publisher side, but does not
> >> start replication.
> >> Recreating the subscription does not help either.
> >>
> >> There should be a solution as logical replication is documented to
> >> support upgrade of the standby.
> >>
> >
> > i just performed an upgrade using pg_upgrade link mode from pg13 to pg14.
> >
> > the only additional step as mentioned in the docs was to create the
> > logical replication slot manually.
> > subscription started working just fine.
> >
> >
> > postgres(at)controller:/var/tmp/log$ pwd
> > /var/tmp/log
> > postgres(at)controller:/var/tmp/log$ /usr/lib/postgresql/14/bin/initdb -D
> > /var/tmp/log/testdb
> > postgres(at)controller:/var/tmp/log$ tail postgresql.conf >>
> > testdb/postgresql.conf
> > postgres(at)controller:/var/tmp/log$ tail testdb/postgresql.conf
> > # Add settings for extensions here
> > wal_level=logical
> > archive_mode = on
> > archive_command = '/bin/true'
> > max_wal_size = 48MB
> > min_wal_size = 32MB
> > shared_buffers = 32MB
> > port = 8001
> > max_logical_replication_workers = 10
> >
> > postgres(at)controller:/var/tmp/log$ /usr/lib/postgresql/14/bin/pg_ctl -D
> > testdb -l logfile start
> > waiting for server to start.... done
> > server started
> > postgres(at)controller:/var/tmp/log$ psql -p 8001
> > psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> > Type "help" for help.
> >
> > postgres=# create table t(id int primary key);
> > CREATE TABLE
> > postgres=# create subscription mysub connection 'port=5001' publication
> > mypub;
> > NOTICE: created replication slot "mysub" on publisher
> > CREATE SUBSCRIPTION
> > postgres=# table t;
> > id
> > ----
> > 1
> > 2
> > (2 rows)
> >
> > *-- after i do an upgrade it stop getting updates from publisher as no
> slot
> > on pg14*
> > *postgres=# select * from pg_stat_subscription;*
> > * subid | subname | pid | relid | received_lsn | last_msg_send_time |
> > last_msg_receipt_time | latest_end_lsn | latest_end_time*
> >
> *-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------*
> > * 16389 | mysub | | | | |
> > | |*
> > *(1 row)*
> >
> > *-- but when i create the logical replication slot on pg14 (see below),
> it
> > starts running just fine*
> > *postgres=# select * from pg_stat_subscription;*
> > *postgres=# \x*
> > *Expanded display is on.*
> > *postgres=# select * from pg_stat_subscription;*
> > *-[ RECORD 1 ]---------+---------------------------------*
> > *subid | 16389*
> > *subname | mysub*
> > *pid | 6099*
> > *relid |*
> > *received_lsn | 0/C1BC8E0*
> > *last_msg_send_time | 2022-02-06 03:59:07.272526+05:30*
> > *last_msg_receipt_time | 2022-02-06 03:59:07.272627+05:30*
> > *latest_end_lsn | 0/C1BC8E0*
> > *latest_end_time | 2022-02-06 03:59:07.272526+05:30*
> >
> > *postgres=# table t;*
> > *-[ RECORD 1 ]*
> > *id | 1*
> > *-[ RECORD 2 ]*
> > *id | 2*
> > *-[ RECORD 3 ]*
> > *id | 3*
> >
> >
> -------------------------------------------------------------------------------------------------
> > upgrade from pg13 to pg14
> > postgres(at)controller:~/temp/13$ tail postgresql.conf
> > # Add settings for extensions here
> > wal_level=logical
> > archive_mode = on
> > archive_command = '/bin/true'
> > max_wal_size = 48MB
> > min_wal_size = 32MB
> > shared_buffers = 32MB
> > port = 5001
> > max_logical_replication_workers = 10
> >
> > postgres(at)controller:~/temp/13$ initdb -D pg13
> > postgres(at)controller:~/temp/13$ cp postgresql.conf pg13/postgresql.conf
> > postgres(at)controller:~/temp/13$ pg_ctl -D pg13 -l logfile start
> > waiting for server to start.... done
> > server started
> > postgres(at)controller:~/temp/13$ psql -p 5001
> > psql (13.5 (Ubuntu 13.5-2.pgdg20.04+1))
> > Type "help" for help.
> >
> > postgres=# create table t(id int primary key);
> > CREATE TABLE
> > postgres=# insert into t values (1);
> > INSERT 0 1
> > postgres=# insert into t values (2);
> > INSERT 0 1
> > postgres=# checkpoint;
> > CHECKPOINT
> > postgres=# create publication mypub for table t;
> > CREATE PUBLICATION
> > postgres=# \x
> > Expanded display is on.
> > postgres=# select * from pg_replication_slots; -- note the details as we
> > need to create on pg14 after upgrade
> > -[ RECORD 1 ]-------+----------
> > slot_name | mysub
> > plugin | pgoutput
> > slot_type | logical
> > datoid | 13414
> > database | postgres
> > temporary | f
> > active | t
> > active_pid | 5332
> > xmin |
> > catalog_xmin | 489
> > restart_lsn | 0/15E7098
> > confirmed_flush_lsn | 0/15E70D0
> > wal_status | reserved
> > safe_wal_size |
> >
> > postgres=# \q
> > postgres(at)controller:~/temp/13$ pg_ctl -D pg13 -l logfile stop
> > waiting for server to shut down.... done
> > server stopped
> >
> >
> > ---------------------------------------
> > pg14
> >
> > postgres(at)controller:~/temp/14$ initdb -D
> /var/lib/postgresql/temp/14/pg14
> > #perform upgrade
> > postgres(at)controller:~/temp/14$
> > /usr/lib/postgresql/${NEW_PSQL}/bin/pg_upgrade -b
> > /usr/lib/postgresql/${OLD_PSQL}/bin/ -B
> > /usr/lib/postgresql/${NEW_PSQL}/bin/ -d
> > /var/lib/postgresql/temp/${OLD_PSQL}/pg${OLD_PSQL} -D
> > /var/lib/postgresql/temp//${NEW_PSQL}/pg${NEW_PSQL} -o "-c
> >
> config-file=/var/lib/postgresql/temp/${OLD_PSQL}/pg${OLD_PSQL}/postgresql.conf"
> > -O "-c
> >
> config-file=/var/lib/postgresql/temp/${NEW_PSQL}/pg${NEW_PSQL}/postgresql.conf"
> > --link
> > Performing Consistency Checks
> > -----------------------------
> > Checking cluster versions ok
> > Checking database user is the install user ok
> > Checking database connection settings ok
> > Checking for prepared transactions ok
> > Checking for system-defined composite types in user tables ok
> > Checking for reg* data types in user tables ok
> > Checking for contrib/isn with bigint-passing mismatch ok
> > Checking for user-defined encoding conversions ok
> > Checking for user-defined postfix operators ok
> > Creating dump of global objects ok
> > Creating dump of database schemas
> > ok
> > Checking for presence of required libraries ok
> > Checking database user is the install user ok
> > Checking for prepared transactions ok
> > Checking for new cluster tablespace directories ok
> >
> > If pg_upgrade fails after this point, you must re-initdb the
> > new cluster before continuing.
> >
> > Performing Upgrade
> > ------------------
> > Analyzing all rows in the new cluster ok
> > Freezing all rows in the new cluster ok
> > Deleting files from new pg_xact ok
> > Copying old pg_xact to new server ok
> > Setting oldest XID for new cluster ok
> > Setting next transaction ID and epoch for new cluster ok
> > Deleting files from new pg_multixact/offsets ok
> > Copying old pg_multixact/offsets to new server ok
> > Deleting files from new pg_multixact/members ok
> > Copying old pg_multixact/members to new server ok
> > Setting next multixact ID and offset for new cluster ok
> > Resetting WAL archives ok
> > Setting frozenxid and minmxid counters in new cluster ok
> > Restoring global objects in the new cluster ok
> > Restoring database schemas in the new cluster
> > ok
> > Adding ".old" suffix to old global/pg_control ok
> >
> > If you want to start the old cluster, you will need to remove
> > the ".old" suffix from
> > /var/lib/postgresql/temp/13/pg13/global/pg_control.old.
> > Because "link" mode was used, the old cluster cannot be safely
> > started once the new cluster has been started.
> >
> > Linking user relation files
> > ok
> > Setting next OID for new cluster ok
> > Sync data directory to disk ok
> > Creating script to delete old cluster ok
> > Checking for extension updates ok
> >
> > Upgrade Complete
> > ----------------
> > Optimizer statistics are not transferred by pg_upgrade.
> > Once you start the new server, consider running:
> > /usr/lib/postgresql/14/bin/vacuumdb --all --analyze-in-stages
> >
> > Running this script will delete the old cluster's data files:
> > ./delete_old_cluster.sh
> >
> > #copy the conf file
> > postgres(at)controller:~/temp/14$ cp postgresql.conf pg14/
> > postgres(at)controller:~/temp/14$ pg_ctl -D pg14 -l logfile start
> > waiting for server to start.... done
> > server started
> >
> > postgres(at)controller:~/temp/14$ /usr/lib/postgresql/14/bin/vacuumdb -p
> 5001
> > --all --analyze-in-stages
> > vacuumdb: processing database "postgres": Generating minimal optimizer
> > statistics (1 target)
> > vacuumdb: processing database "template1": Generating minimal optimizer
> > statistics (1 target)
> > vacuumdb: processing database "postgres": Generating medium optimizer
> > statistics (10 targets)
> > vacuumdb: processing database "template1": Generating medium optimizer
> > statistics (10 targets)
> > vacuumdb: processing database "postgres": Generating default (full)
> > optimizer statistics
> > vacuumdb: processing database "template1": Generating default (full)
> > optimizer statistics
> > postgres(at)controller:~/temp/14$ psql -p 5001
> >
> >
> > ### this is the only post action after upgrade i had to do (since you
> would
> > use ansible, you can automate this my creating slots manually or via if
> > exists and not via create subscription
> > psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> > Type "help" for help.
> >
> > postgres=# SELECT * FROM pg_create_logical_replication_slot('mysub',
> > 'pgoutput', false, true);
> > slot_name | lsn
> > -----------+-----------
> > mysub | 0/C19E358
> > (1 row)
> >
> > postgres=# select * from pg_replication_slots;
> > postgres=# \x
> > Expanded display is on.
> > postgres=# select * from pg_replication_slots;
> > -[ RECORD 1 ]-------+----------
> > slot_name | mysub
> > plugin | pgoutput
> > slot_type | logical
> > datoid | 16401
> > database | postgres
> > temporary | f
> > active | t
> > active_pid | 6100
> > xmin |
> > catalog_xmin | 760
> > restart_lsn | 0/C19E320
> > confirmed_flush_lsn | 0/C19E358
> > wal_status | reserved
> > safe_wal_size |
> > two_phase | t
> >
> >
> > --- the below changes get pushed to subscriber just fine. i did not
> create
> > publication.
> > postgres=# insert into t values (3);
> > INSERT 0 1
> > postgres=# \q
> > postgres(at)controller:~/temp/14$ pg_ctl -D pg14 -l logfile stop
> > waiting for server to shut down.... done
> > server stopped
> >
> >
>
>
> --
> PGP-Key: CDE74120 ☀ computing @ chaos claudius
>

--
Thanks,
Vijay
LinkedIn - Vijaykumar Jain <https://www.linkedin.com/in/vijaykumarjain/>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Vijaykumar Jain 2022-02-06 19:40:50 Re: restarting logical replication after upgrading standby
Previous Message David G. Johnston 2022-02-06 17:49:50 Re: regarding remote connection