BDR not catching up

From: cchee-ob <carter(dot)chee(at)objectbrains(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: BDR not catching up
Date: 2016-03-11 23:33:10
Message-ID: 1457739190408-5892335.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm getting this message repeating on the UDR node that I just added today.
Any way to get it start applying?
svp2=# select * from bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status |
node_name | node_local_dsn |
node_init_from_dsn

---------------------+---------------+------------+-------------+-----------------+---------------------------------------+------------------------------------
-------
6206439726032130602 | 1 | 16385 | r | UDR1
| |
6260914790689848233 | 1 | 16385 | c |
UDR1-subscriber | host=10.253.0.8 port=5432 dbname=svp2 |
host=10.253.228.105 port=5432 dbnam
e=svp2
(2 rows)

t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG: 00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION: bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG: 00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG: 00000: launching catchup
mode apply worker
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:23:51 PST d= h= p=7226 a=DEBUG: 00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10AC4F0
t=2016-03-11 15:23:51 PST d= h= p=7226 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG: 00000: registering background
worker "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG: 00000: starting background
worker process "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION: do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE: 00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE: 00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:51 PST d= h= p=7227 a=NOTICE: 42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10AC4F0" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10A"
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: truncate_identifier,
scansup.c:195
t=2016-03-11 15:23:51 PST d= h= p=7227 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:51 PST d= h= p=7227 a=INFO: 00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:23:51 PST d= h= p=7227 a=DEBUG: 00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:23:51 PST d= h= p=7227 a=ERROR: XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:23:51 PST d= h= p=7227 a=LOCATION: read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG: 00000: worker process: bdr:
catchup apply to 19E/10AC4F0 (PID 7227) exited with exit code 1
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOG: 00000: unregistering
background worker "bdr: catchup apply to 19E/10AC4F0"
t=2016-03-11 15:23:51 PST d= h= p=4718 a=LOCATION: ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:23:52 PST d= h= p=7226 a=ERROR: XX000: catchup worker
exited before catching up to target LSN 19E/10AC4F0
t=2016-03-11 15:23:52 PST d= h= p=7226 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:23:52 PST d= h= p=4718 a=LOG: 00000: worker process: bdr
db: svp2 (PID 7226) exited with exit code 1
t=2016-03-11 15:23:52 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:54 PST d= h= p=7228 a=DEBUG: 00000: autovacuum:
processing database "bdr_supervisordb"
t=2016-03-11 15:23:54 PST d= h= p=7228 a=LOCATION: AutoVacWorkerMain,
autovacuum.c:1684
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG: 00000: starting background
worker process "bdr db: svp2"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION: do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:57 PST d= h= p=7229 a=NOTICE: 00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7229 a=NOTICE: 00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG: 00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG: 00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG: 00000: launching catchup
mode apply worker
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:23:57 PST d= h= p=7229 a=DEBUG: 00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10BA488
t=2016-03-11 15:23:57 PST d= h= p=7229 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG: 00000: registering background
worker "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG: 00000: starting background
worker process "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION: do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE: 00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE: 00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:23:57 PST d= h= p=7230 a=NOTICE: 42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10BA488" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10B"
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: truncate_identifier,
scansup.c:195
t=2016-03-11 15:23:57 PST d= h= p=7230 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:23:57 PST d= h= p=7230 a=INFO: 00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:23:57 PST d= h= p=7230 a=DEBUG: 00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:23:57 PST d= h= p=7230 a=ERROR: XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:23:57 PST d= h= p=7230 a=LOCATION: read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG: 00000: worker process: bdr:
catchup apply to 19E/10BA488 (PID 7230) exited with exit code 1
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOG: 00000: unregistering
background worker "bdr: catchup apply to 19E/10BA488"
t=2016-03-11 15:23:57 PST d= h= p=4718 a=LOCATION: ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:23:58 PST d= h= p=7229 a=ERROR: XX000: catchup worker
exited before catching up to target LSN 19E/10BA488
t=2016-03-11 15:23:58 PST d= h= p=7229 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:23:58 PST d= h= p=4718 a=LOG: 00000: worker process: bdr
db: svp2 (PID 7229) exited with exit code 1
t=2016-03-11 15:23:58 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG: 00000: starting background
worker process "bdr db: svp2"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION: do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:24:03 PST d= h= p=7231 a=NOTICE: 00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7231 a=NOTICE: 00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG: 00000: per-db worker for
node bdr (6260914790689848233,1,16385,) starting
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: bdr_perdb_worker_main,
bdr_perdb.c:707
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG: 00000: init_replica init
from remote host=10.253.228.105 port=5432 dbname=svp2
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:830
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG: 00000: launching catchup
mode apply worker
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: bdr_init_replica,
bdr_init_replica.c:1043
t=2016-03-11 15:24:03 PST d= h= p=7231 a=DEBUG: 00000: Registering bdr
apply catchup worker for bdr (6206439726032130602,1,16385,) to lsn
19E/10E9D58
t=2016-03-11 15:24:03 PST d= h= p=7231 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1161
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG: 00000: registering background
worker "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION:
BackgroundWorkerStateChange, bgworker.c:347
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG: 00000: starting background
worker process "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION: do_start_bgworker,
postmaster.c:5412
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE: 00000: version "1.0" of
extension "btree_gist" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE: 00000: version "0.9.2.0"
of extension "bdr" is already installed
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: ExecAlterExtensionStmt,
extension.c:2700
t=2016-03-11 15:24:03 PST d= h= p=7232 a=NOTICE: 42622: identifier "bdr
(6260914790689848233,1,16385,): apply catchup up to 19E/10E9D58" will be
truncated to "bdr (6260914790689848233,1,16385,): apply catchup up to
19E/10E"
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: truncate_identifier,
scansup.c:195
t=2016-03-11 15:24:03 PST d= h= p=7232 a=DEBUG: 00000: found valid
replication identifier 1
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION:
bdr_establish_connection_and_slot, bdr.c:572
t=2016-03-11 15:24:03 PST d= h= p=7232 a=INFO: 00000: starting up
replication from 1 at 19D/D204D0C8
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: bdr_apply_main,
bdr_apply.c:2550
t=2016-03-11 15:24:03 PST d= h= p=7232 a=DEBUG: 00000: bdr_apply: BEGIN
origin(source, orig_lsn, timestamp): 19D/D204D3A0, 2016-03-11
13:49:47.293208-08
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: process_remote_begin,
bdr_apply.c:198
t=2016-03-11 15:24:03 PST d= h= p=7232 a=ERROR: XX000: tuple natts
mismatch, 26 vs 28
t=2016-03-11 15:24:03 PST d= h= p=7232 a=LOCATION: read_tuple_parts,
bdr_apply.c:1892
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG: 00000: worker process: bdr:
catchup apply to 19E/10E9D58 (PID 7232) exited with exit code 1
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOG: 00000: unregistering
background worker "bdr: catchup apply to 19E/10E9D58"
t=2016-03-11 15:24:03 PST d= h= p=4718 a=LOCATION: ForgetBackgroundWorker,
bgworker.c:376
t=2016-03-11 15:24:04 PST d= h= p=7231 a=ERROR: XX000: catchup worker
exited before catching up to target LSN 19E/10E9D58
t=2016-03-11 15:24:04 PST d= h= p=7231 a=LOCATION: bdr_catchup_to_lsn,
bdr_init_replica.c:1273
t=2016-03-11 15:24:04 PST d= h= p=4718 a=LOG: 00000: worker process: bdr
db: svp2 (PID 7231) exited with exit code 1
t=2016-03-11 15:24:04 PST d= h= p=4718 a=LOCATION: LogChildExit,
postmaster.c:3325

This is from the primary node:

svp2=# SELECT
slot_name, database, active,
pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn)
AS retained_bytes
FROM pg_replication_slots
WHERE plugin = 'bdr';
slot_name | database | active |
retained_bytes
-----------------------------------------+----------+--------+----------------
bdr_16385_6260914790689848233_1_16385__ | svp2 | f |
687816472
(1 row)

And this same scenario happens every time I try to add a new node.

Thank you,

Carter

--
View this message in context: http://postgresql.nabble.com/BDR-not-catching-up-tp5892335.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2016-03-11 23:45:56 Re: enum bug
Previous Message Elein 2016-03-11 23:19:21 Re: enum bug