Re: Subscription stuck at initialize state

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Abhishek Bhola <abhishek(dot)bhola(at)japannext(dot)co(dot)jp>
Cc: Steve Baldwin <steve(dot)baldwin(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subscription stuck at initialize state
Date: 2022-02-03 18:04:46
Message-ID: CAM+6J962X11n9Pic_3izg5GsUtvqAdUQAVCebpbAQBuHvB+fwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

trimming the email, to avoid noise.

I spent a lot of time trying multiple options/combinations and finally
managed to replicate your debug output.
this is when i have ,

postgres=# show max_logical_replication_workers;
* max_logical_replication_workers*
---------------------------------
* 2*
(1 row)

on publisher
postgres(at)controller:~$ psql -p 5001
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# create table t_bytea(id int primary key, somebytea bytea);
CREATE TABLE
postgres=# create table t1(id int);
CREATE TABLE
postgres=# create table t2(id int);
CREATE TABLE
postgres=# create table t3(id int);
CREATE TABLE
postgres=# create publication mypub1 for table t1;
CREATE PUBLICATION
postgres=# create publication mypub2 for table t2;
CREATE PUBLICATION
postgres=# create publication mypub3 for table t3;
CREATE PUBLICATION
postgres=# create publication mypub4 for table t3;
CREATE PUBLICATION
postgres=# create publication mypub5 for table t_bytea;
CREATE PUBLICATION
postgres=# insert into t_bytea select x,repeat(repeat('xxx', 100),
1000)::bytea from generate_series(1, 1000) x;
INSERT 0 1000

on subscriber
postgres(at)controller:~$ psql -p 5002
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# create table t_bytea(id int primary key, somebytea bytea);
CREATE TABLE
postgres=# create table t1(id int);
CREATE TABLE
postgres=# create table t2(id int);
CREATE TABLE
postgres=# create table t3(id int);
CREATE TABLE
postgres=# create table t4(id int);
CREATE TABLE
postgres=# create table t5(id int);
CREATE TABLE
postgres=# create subscription mysub1 connection 'port=5001' publication
mypub1;
NOTICE: created replication slot "mysub1" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub2 connection 'port=5001' publication
mypub2;
NOTICE: created replication slot "mysub2" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub3 connection 'port=5001' publication
mypub3;
NOTICE: created replication slot "mysub3" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub4 connection 'port=5001' publication
mypub4;
NOTICE: created replication slot "mysub4" on publisher
CREATE SUBSCRIPTION
postgres=# create subscription mysub5 connection 'port=5001' publication
mypub5;
NOTICE: created replication slot "mysub5" on publisher
CREATE SUBSCRIPTION
*postgres=# select count(1) from t_bytea;*
* count*
*-------*
* 0*
*(1 row)*

postgres=# table pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+-----------
16406 | 16391 | r | 0/1722838
* 16407 | 16394 | i |*
* 16408 | 16397 | i |*
* 16409 | 16397 | i |*
* 16410 | 16384 | i |*
(5 rows)

# as expected no data in t_bytea (as it could not get any worker)
postgres=# select count(1) from t_bytea;
count
-------
0
(1 row)

but logs clearly state the problem
2022-02-03 23:18:31.107 IST [3430] LOG: logical replication table
synchronization worker for subscription "mysub1", table "t1" has started
2022-02-03 23:18:31.138 IST [3430] LOG: logical replication table
synchronization worker for subscription "mysub1", table "t1" has finished
2022-02-03 23:18:40.730 IST [3433] LOG: logical replication apply worker
for subscription "mysub2" has started
*2022-02-03 23:18:40.737 IST [3433] WARNING: out of logical replication
worker slots*
*2022-02-03 23:18:40.737 IST [3433] HINT: You might need to increase
max_logical_replication_workers.*
*2022-02-03 23:18:45.865 IST [3433] WARNING: out of logical replication
worker slots*

*#publisher logs*
2022-02-03 23:18:31.096 IST [3427] STATEMENT: CREATE_REPLICATION_SLOT
"mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:31.106 IST [3429] LOG: starting logical decoding for slot
"mysub1"
2022-02-03 23:18:31.106 IST [3429] DETAIL: Streaming transactions
committing after 0/1722800, reading WAL from 0/17227C8.
2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT
"mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')
2022-02-03 23:18:31.106 IST [3429] LOG: logical decoding found consistent
point at 0/17227C8
2022-02-03 23:18:31.106 IST [3429] DETAIL: There are no running
transactions.
2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT
"mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')
2022-02-03 23:18:31.129 IST [3431] LOG: logical decoding found consistent
point at 0/1722800
2022-02-03 23:18:31.129 IST [3431] DETAIL: There are no running
transactions.
2022-02-03 23:18:31.129 IST [3431] STATEMENT: CREATE_REPLICATION_SLOT
"pg_16406_sync_16391_7060540926182153512" LOGICAL pgoutput USE_SNAPSHOT
2022-02-03 23:18:31.135 IST [3431] LOG: starting logical decoding for slot
"pg_16406_sync_16391_7060540926182153512"
2022-02-03 23:18:31.135 IST [3431] DETAIL: Streaming transactions
committing after 0/1722838, reading WAL from 0/1722800.
2022-02-03 23:18:31.135 IST [3431] STATEMENT: START_REPLICATION SLOT
"pg_16406_sync_16391_7060540926182153512" LOGICAL 0/1722838 (proto_version
'2', publication_names '"mypub1"')
2022-02-03 23:18:35.718 IST [3432] LOG: logical decoding found consistent
point at 0/1722838
2022-02-03 23:18:35.718 IST [3432] DETAIL: There are no running
transactions.
2022-02-03 23:18:35.718 IST [3432] STATEMENT: CREATE_REPLICATION_SLOT
"mysub2" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:40.737 IST [3434] LOG: starting logical decoding for slot
"mysub2"
2022-02-03 23:18:40.737 IST [3434] DETAIL: Streaming transactions
committing after 0/1722870, reading WAL from 0/1722838.
2022-02-03 23:18:40.737 IST [3434] STATEMENT: START_REPLICATION SLOT
"mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')
2022-02-03 23:18:40.737 IST [3434] LOG: logical decoding found consistent
point at 0/1722838
2022-02-03 23:18:40.737 IST [3434] DETAIL: There are no running
transactions.
2022-02-03 23:18:40.737 IST [3434] STATEMENT: START_REPLICATION SLOT
"mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')
2022-02-03 23:18:40.857 IST [3435] LOG: logical decoding found consistent
point at 0/1722870
2022-02-03 23:18:40.857 IST [3435] DETAIL: There are no running
transactions.
2022-02-03 23:18:40.857 IST [3435] STATEMENT: CREATE_REPLICATION_SLOT
"mysub3" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:46.091 IST [3437] LOG: logical decoding found consistent
point at 0/17228A8
2022-02-03 23:18:46.091 IST [3437] DETAIL: There are no running
transactions.
2022-02-03 23:18:46.091 IST [3437] STATEMENT: CREATE_REPLICATION_SLOT
"mysub4" LOGICAL pgoutput NOEXPORT_SNAPSHOT
2022-02-03 23:18:51.941 IST [3439] LOG: logical decoding found consistent
point at 0/17228E0
2022-02-03 23:18:51.941 IST [3439] DETAIL: There are no running
transactions.
2022-02-03 23:18:51.941 IST [3439] STATEMENT: CREATE_REPLICATION_SLOT
"mysub5" LOGICAL pgoutput NOEXPORT_SNAPSHOT

*# solution*
so i bump the max_logical_replication_workers = 10 and restart the db
servers.
as there are workers available, replication catches up and tables in sync.

# on subscriber
psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
Type "help" for help.

postgres=# select count(1) from t_bytea;
count
-------
0
(1 row)

postgres=# table pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+-----------
16406 | 16391 | r | 0/1722838
16407 | 16394 | r | 0/2000110
16408 | 16397 | r | 0/2000148
16409 | 16397 | r | 0/2000148
16410 | 16384 | r | 0/20001B8
(5 rows)

postgres=# select count(1) from t_bytea;
count
-------
1000
(1 row)

so two things,
can you check logs for warning messages of out of *logical replication
worker slots *on subscribers ?
can you try bumping them, and check if it catches up (srsubstate should
either be *r(READY)* or *d (INITIAL COPY))*

also monitor logs for both publisher and subscriber.

I also figured out all the debugging steps I requested earlier were
useless.

--
Thanks,
Vijay
LinkedIn - Vijaykumar Jain <https://www.linkedin.com/in/vijaykumarjain/>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2022-02-03 19:11:35 Re: sort order for UTF-8 char column with Japanese UTF-8
Previous Message Tom Lane 2022-02-03 16:36:27 Re: Undetected Deadlock