[BDR] Node Join Question

From: "Wayne E(dot) Seguin" <wayneeseguin(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: [BDR] Node Join Question
Date: 2015-05-12 06:33:59
Message-ID: CANf8RLvYXsLPxOHUcSvdBcVNkTmNiHKkMY2_ncer3egky+ac+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

To the wonderful BDR folks,

I have a 5 node pg+bdr cluster that I've spun up. I have verified that each
node can connect via psql to each other node. I am having an issue creating
the bdr group and having the nodes join each other.

The steps I am taking are as follows (I used pg compiled using getting
started guide instructions):
1. initdb
2. start postgres
3. create database pgbdr
4. create extension btree_gist (actually I'm doing this on template1 for my
purposes)
5. create extension bdr
6. on node 0 I am doing:
SELECT bdr.bdr_group_create(
local_node_name := 'pgbdr0',
node_external_dsn := 'host=10.244.2.2 port=5432 user=postgres
dbname=pgbdr'
);
7. on nodes 1-4 I am doing (adjusted for the nodes IP):
SELECT bdr.bdr_group_join(
local_node_name := 'pgbdr1',
node_external_dsn := 'host=10.244.2.6 port=5432 user=postgres
dbname=pgbdr',
join_using_dsn := 'host=10.244.2.2 port=5432 user=postgres dbname=pgbdr'
);

Sometimes this works and all nodes are joined, but it isn't consistent and
I'm not sure at the moment how to debug why.

From the most recent attempt on node 0:

pgbdr=# SELECT * FROM bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status | node_name
| node_local_dsn | node_init_from_dsn
---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+--------------------
6147857338297172162 | 1 | 16908 | r | rdpg0
| host=10.244.2.2 port=5432 user=postgres dbname=pgbdr |
(1 row)

From the most recent attempt on node 1:

pgbdr=# SELECT * FROM bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status | node_name
| node_local_dsn |
node_init_from_d
sn
---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+----------------------------------
--------------------
6147857345042854082 | 1 | 16908 | i | rdpg1
| host=10.244.2.6 port=5432 user=postgres dbname=pgbdr | host=10.244.2.2
port=5432 user=po
stgres dbname=pgbdr
(1 row)

I have pasted some relevant logs from node 0 further below.

Even if I use 'SELECT bdr.bdr_node_join_wait_for_ready();' it will usually
just hang.

How can I debug why the nodes are not connecting to each other find out
where they getting stuck trying to join?

~Wayne

Wayne E. Seguin
wayneeseguin(at)gmail(dot)com
wayneeseguin on irc.freenode.net
http://twitter.com/wayneeseguin/
https://github.com/wayneeseguin/

d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
received replication command: IDENTIFY_SYSTEM
[171/773]
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
received replication command: CREATE_REPLICATION_SLOT
"bdr_16908_6147857345042854082_1_16908__"
LOGICAL bdr
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
searching for logical decoding starting point, starting at 0/18E87A0
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
DecodingContextFindStartpoint, logical.c:469
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000:
logical decoding found consistent point at 0/18E87A0
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DETAIL: There are
no running transactions.
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
SnapBuildFindSnapshot, snapbuild.c:1271
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000:
exported logical decoding snapshot: "0000071E-1" with 0 transaction IDs
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
SnapBuildExportSnapshot, snapbuild.c:604
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
received replication command: IDENTIFY_SYSTEM
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
received replication command: CREATE_REPLICATION_SLOT
"bdr_16908_6147857344392790209_1_16908__"
LOGICAL bdr
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
searching for logical decoding starting point, starting at 0/18EC970
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
DecodingContextFindStartpoint, logical.c:469
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
logical decoding found initial starting point at 0/18EC970
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: 1
transaction needs to finish.
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
SnapBuildFindSnapshot, snapbuild.c:1338
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
received replication command: IDENTIFY_SYSTEM
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
received replication command: CREATE_REPLICATION_SLOT
"bdr_16908_6147857340000055490_1_16908__"
LOGICAL bdr
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 08006:
unexpected EOF on client connection with an open transaction
d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
SocketBackend, postgres.c:344
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
searching for logical decoding starting point, starting at 0/18EC9F0
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
DecodingContextFindStartpoint, logical.c:469
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
logical decoding found initial starting point at 0/18EC9F0
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: 1
transaction needs to finish.
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
SnapBuildFindSnapshot, snapbuild.c:1338
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
logical decoding found consistent point at 0/18ECA30
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL:
Transaction ID 1822 finished; no more running transactions.
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
SnapBuildEndTxn, snapbuild.c:907
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
logical decoding found consistent point at 0/18ECA30
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL:
Transaction ID 1822 finished; no more running transactions.
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
SnapBuildEndTxn, snapbuild.c:907
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
exported logical decoding snapshot: "00000720-1" with 0 transaction IDs
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
SnapBuildExportSnapshot, snapbuild.c:604
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
exported logical decoding snapshot: "0000071F-1" with 1 transaction ID
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
SnapBuildExportSnapshot, snapbuild.c:604
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
received replication command: IDENTIFY_SYSTEM
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
received replication command: CREATE_REPLICATION_SLOT
"bdr_16908_6147857340923527359_1_16908__"
LOGICAL bdr
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
exec_replication_command, walsender.c:1291
d= p=392 a=DEBUG: 00000: started 0 elections
d= p=392 a=LOCATION: bdr_sequencer_start_elections, bdr_seq.c:688
d= p=392 a=DEBUG: 00000: started 0 votes
d= p=392 a=LOCATION: bdr_sequencer_vote, bdr_seq.c:629
d= p=392 a=DEBUG: 00000: tallied 0 elections
d= p=392 a=LOCATION: bdr_sequencer_tally, bdr_seq.c:755
d= p=392 a=DEBUG: 00000: checked 0 sequences for filling
d= p=392 a=LOCATION: bdr_sequencer_fill_sequences, bdr_seq.c:1034
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 08006:
unexpected EOF on client connection with an open transaction
d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
SocketBackend, postgres.c:344
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 08006:
unexpected EOF on client connection with an open transaction
d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
SocketBackend, postgres.c:344
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
searching for logical decoding starting point, starting at 0/18ECA60
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
DecodingContextFindStartpoint, logical.c:469
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
logical decoding found initial starting point at 0/18ECA60
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: 2
transactions need to finish.
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
SnapBuildFindSnapshot, snapbuild.c:1338
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
logical decoding found consistent point at 0/18ECAD0
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL:
Transaction ID 1823 finished; no more running transactions.
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
SnapBuildEndTxn, snapbuild.c:907
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
exported logical decoding snapshot: "00000721-1" with 0 transaction IDs
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
SnapBuildExportSnapshot, snapbuild.c:604
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 08006:
unexpected EOF on client connection with an open transaction
d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
SocketBackend, postgres.c:344

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Wayne E. Seguin 2015-05-12 06:36:09 Re: [BDR] Node Join Question
Previous Message Peter Mogensen 2015-05-12 05:19:07 Re: Comparing txid and xmin (under BDR)