| From: | Don Seiler <don(at)seiler(dot)us> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Runaway Initial Table Syncs in Logical Replication? |
| Date: | 2023-08-04 14:28:48 |
| Message-ID: | CAHJZqBAkg27MFv9xbMTuJvcL25EhS2mzxynDr7bLFPcdqZZ=ew@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Thu, Aug 3, 2023 at 9:12 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
wrote:
>
> Just to confirm, which subversion of PG15 are you using on the subscriber
> side ?
> Is it 15.3 ? I am confirming because I recall we fixed similar bug before
> in commit [1].
>
Yes. Specifically on the target/subscriber:
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.3 (Ubuntu 15.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
(1 row)
Installed on Ubuntu 22.04 via the PGDG apt repo.
The source/publisher side is:
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 12.15 (Ubuntu 12.15-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)
Are there any ERRORs(including erros other than the slots number) in the
> log of publisher and subscriber.
> It would be helpful to see the log in case there are some other ERRORs
> that could cause problem.
>
On the subscriber I see a series of errors similar to this:
2023-08-03 18:22:50.196 UTC [239889] LOG: logical replication table
synchronization worker for subscription "sub01", table "users" has started
2023-08-03 18:22:50.209 UTC [239889] ERROR: could not create replication
slot "pg_19742_sync_18602_7263122209699118815": ERROR: all replication
slots are in use
HINT: Free one or increase max_replication_slots.
on the publisher I see this corresponding message:
2023-08-03 18:22:50.199 UTC [13884] [unknown](at)[unknown] - [unknown] LOG:
connection received: host=10.228.88.74 port=47338
2023-08-03 18:22:50.207 UTC [13884] migrator(at)foo - [unknown] LOG:
replication connection authorized: user=migrator
application_name=pg_19742_sync_18602_7263122209699118815 SSL enabled
(protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off)
2023-08-03 18:22:50.209 UTC [13884] migrator(at)foo -
pg_19742_sync_18602_7263122209699118815 ERROR: all replication slots are
in use
2023-08-03 18:22:50.209 UTC [13884] migrator(at)foo -
pg_19742_sync_18602_7263122209699118815 HINT: Free one or increase
max_replication_slots.
2023-08-03 18:22:50.209 UTC [13884] migrator(at)foo -
pg_19742_sync_18602_7263122209699118815 STATEMENT: CREATE_REPLICATION_SLOT
"pg_19742_sync_18602_7263122209699118815" LOGICAL pgoutput USE_SNAPSHOT
2023-08-03 18:22:50.209 UTC [13884] migrator(at)foo -
pg_19742_sync_18602_7263122209699118815 LOG: disconnection: session time:
0:00:00.009 user=migrator database=foo host=10.228.88.74 port=47338
and later they include this message about trying to drop them first on the
subscriber:
2023-08-03 18:47:39.485 UTC [263502] LOG: logical replication table
synchronization worker for subscription "sub01", table "foo_accounts" has
started
2023-08-03 18:47:39.496 UTC [263502] LOG: could not drop replication slot
"pg_19742_sync_17238_7263122209699118815" on publisher: ERROR: replication
slot "pg_19742_sync_17238_7263122209699118815" does not exist
2023-08-03 18:47:39.497 UTC [263502] ERROR: could not create replication
slot "pg_19742_sync_17238_7263122209699118815": ERROR: all replication
slots are in use
HINT: Free one or increase max_replication_slots.
and on the publisher side for that:
2023-08-03 18:47:39.487 UTC [3232] [unknown](at)[unknown] - [unknown] LOG:
connection received: host=10.228.88.74 port=17390
2023-08-03 18:47:39.494 UTC [3232] migrator(at)foo - [unknown] LOG:
replication connection authorized: user=migrator
application_name=pg_19742_sync_17238_7263122209699118815 SSL enabled
(protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off)
2023-08-03 18:47:39.495 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 ERROR: replication slot
"pg_19742_sync_17238_7263122209699118815" does not exist
2023-08-03 18:47:39.495 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 STATEMENT: DROP_REPLICATION_SLOT
pg_19742_sync_17238_7263122209699118815 WAIT
2023-08-03 18:47:39.496 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 ERROR: all replication slots are
in use
2023-08-03 18:47:39.496 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 HINT: Free one or increase
max_replication_slots.
2023-08-03 18:47:39.496 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 STATEMENT: CREATE_REPLICATION_SLOT
"pg_19742_sync_17238_7263122209699118815" LOGICAL pgoutput USE_SNAPSHOT
2023-08-03 18:47:39.496 UTC [3232] migrator(at)foo -
pg_19742_sync_17238_7263122209699118815 LOG: disconnection: session time:
0:00:00.009 user=migrator datab
ase=foo host=10.228.88.74 port=17390
and then this from a different pid, maybe some kind of cleanup process? It
was doing this for what seemed like all of the sync slots:
2023-08-03 18:47:47.975 UTC [3303] migrator(at)foo - sub01 ERROR: replication
slot "pg_19742_sync_17238_7263122209699118815" does not exist
2023-08-03 18:47:47.975 UTC [3303] migrator(at)foo - sub01 STATEMENT:
DROP_REPLICATION_SLOT pg_19742_sync_17238_7263122209699118815 WAIT
--
Don Seiler
www.seiler.us
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jacktby jacktby | 2023-08-05 13:46:33 | How does pg parse 'select '(1,2)'' |
| Previous Message | Karsten Hilbert | 2023-08-04 12:06:18 | Re: question on auto_explain |