Re: [BDR] Node Join Question

From: "Wayne E(dot) Seguin" <wayneeseguin(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [BDR] Node Join Question
Date: 2015-05-12 06:37:23
Message-ID: CANf8RLsp5k4XsYiRAJWJ15ppfX_yU4KVVwT2tFVkwKkZd4KHGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Also, what is the proper way to locate and remove these replication slots
as mentioned in this log?

On Mon, May 11, 2015 at 11:36 PM, Wayne E. Seguin <wayneeseguin(at)gmail(dot)com>
wrote:

> Also,
>
> Is there a way to remove these things from the init target node easier?
>
> d= p=504 a=ERROR: 55000: previous init failed, manual cleanup is required
> d= p=504 a=DETAIL: Found bdr.bdr_nodes entry for bdr
> (6147869128174526660,1,16908,) with state=i in remote bdr.bdr_nodes
> d= p=504 a=HINT: Remove all replication identifiers and slots
> corresponding to this node from the init target node then drop and recreate
> this database and try again
>
>
>
> On Mon, May 11, 2015 at 11:33 PM, Wayne E. Seguin <wayneeseguin(at)gmail(dot)com>
> wrote:
>
>> To the wonderful BDR folks,
>>
>> I have a 5 node pg+bdr cluster that I've spun up. I have verified that
>> each node can connect via psql to each other node. I am having an issue
>> creating the bdr group and having the nodes join each other.
>>
>> The steps I am taking are as follows (I used pg compiled using getting
>> started guide instructions):
>> 1. initdb
>> 2. start postgres
>> 3. create database pgbdr
>> 4. create extension btree_gist (actually I'm doing this on template1 for
>> my purposes)
>> 5. create extension bdr
>> 6. on node 0 I am doing:
>> SELECT bdr.bdr_group_create(
>> local_node_name := 'pgbdr0',
>> node_external_dsn := 'host=10.244.2.2 port=5432 user=postgres
>> dbname=pgbdr'
>> );
>> 7. on nodes 1-4 I am doing (adjusted for the nodes IP):
>> SELECT bdr.bdr_group_join(
>> local_node_name := 'pgbdr1',
>> node_external_dsn := 'host=10.244.2.6 port=5432 user=postgres
>> dbname=pgbdr',
>> join_using_dsn := 'host=10.244.2.2 port=5432 user=postgres
>> dbname=pgbdr'
>> );
>>
>> Sometimes this works and all nodes are joined, but it isn't consistent
>> and I'm not sure at the moment how to debug why.
>>
>> From the most recent attempt on node 0:
>>
>> pgbdr=# SELECT * FROM bdr.bdr_nodes;
>> node_sysid | node_timeline | node_dboid | node_status |
>> node_name | node_local_dsn |
>> node_init_from_dsn
>>
>> ---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+--------------------
>> 6147857338297172162 | 1 | 16908 | r | rdpg0
>> | host=10.244.2.2 port=5432 user=postgres dbname=pgbdr |
>> (1 row)
>>
>> From the most recent attempt on node 1:
>>
>> pgbdr=# SELECT * FROM bdr.bdr_nodes;
>> node_sysid | node_timeline | node_dboid | node_status |
>> node_name | node_local_dsn |
>> node_init_from_d
>> sn
>>
>> ---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+----------------------------------
>> --------------------
>> 6147857345042854082 | 1 | 16908 | i | rdpg1
>> | host=10.244.2.6 port=5432 user=postgres dbname=pgbdr | host=10.244.2.2
>> port=5432 user=po
>> stgres dbname=pgbdr
>> (1 row)
>>
>> I have pasted some relevant logs from node 0 further below.
>>
>> Even if I use 'SELECT bdr.bdr_node_join_wait_for_ready();' it will
>> usually just hang.
>>
>> How can I debug why the nodes are not connecting to each other find out
>> where they getting stuck trying to join?
>>
>> ~Wayne
>>
>> Wayne E. Seguin
>> wayneeseguin(at)gmail(dot)com
>> wayneeseguin on irc.freenode.net
>> http://twitter.com/wayneeseguin/
>> https://github.com/wayneeseguin/
>>
>>
>>
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
>> received replication command: IDENTIFY_SYSTEM
>> [171/773]
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
>> received replication command: CREATE_REPLICATION_SLOT
>> "bdr_16908_6147857345042854082_1_16908__"
>> LOGICAL bdr
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000:
>> searching for logical decoding starting point, starting at 0/18E87A0
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> DecodingContextFindStartpoint, logical.c:469
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000:
>> logical decoding found consistent point at 0/18E87A0
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DETAIL: There
>> are no running transactions.
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> SnapBuildFindSnapshot, snapbuild.c:1271
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000:
>> exported logical decoding snapshot: "0000071E-1" with 0 transaction IDs
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> SnapBuildExportSnapshot, snapbuild.c:604
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
>> received replication command: IDENTIFY_SYSTEM
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
>> received replication command: CREATE_REPLICATION_SLOT
>> "bdr_16908_6147857344392790209_1_16908__"
>> LOGICAL bdr
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000:
>> searching for logical decoding starting point, starting at 0/18EC970
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> DecodingContextFindStartpoint, logical.c:469
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
>> logical decoding found initial starting point at 0/18EC970
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: 1
>> transaction needs to finish.
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> SnapBuildFindSnapshot, snapbuild.c:1338
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
>> received replication command: IDENTIFY_SYSTEM
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
>> received replication command: CREATE_REPLICATION_SLOT
>> "bdr_16908_6147857340000055490_1_16908__"
>> LOGICAL bdr
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 08006:
>> unexpected EOF on client connection with an open transaction
>> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION:
>> SocketBackend, postgres.c:344
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000:
>> searching for logical decoding starting point, starting at 0/18EC9F0
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> DecodingContextFindStartpoint, logical.c:469
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
>> logical decoding found initial starting point at 0/18EC9F0
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: 1
>> transaction needs to finish.
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> SnapBuildFindSnapshot, snapbuild.c:1338
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
>> logical decoding found consistent point at 0/18ECA30
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL:
>> Transaction ID 1822 finished; no more running transactions.
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> SnapBuildEndTxn, snapbuild.c:907
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
>> logical decoding found consistent point at 0/18ECA30
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL:
>> Transaction ID 1822 finished; no more running transactions.
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> SnapBuildEndTxn, snapbuild.c:907
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000:
>> exported logical decoding snapshot: "00000720-1" with 0 transaction IDs
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> SnapBuildExportSnapshot, snapbuild.c:604
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000:
>> exported logical decoding snapshot: "0000071F-1" with 1 transaction ID
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> SnapBuildExportSnapshot, snapbuild.c:604
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
>> received replication command: IDENTIFY_SYSTEM
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
>> received replication command: CREATE_REPLICATION_SLOT
>> "bdr_16908_6147857340923527359_1_16908__"
>> LOGICAL bdr
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> exec_replication_command, walsender.c:1291
>> d= p=392 a=DEBUG: 00000: started 0 elections
>> d= p=392 a=LOCATION: bdr_sequencer_start_elections, bdr_seq.c:688
>> d= p=392 a=DEBUG: 00000: started 0 votes
>> d= p=392 a=LOCATION: bdr_sequencer_vote, bdr_seq.c:629
>> d= p=392 a=DEBUG: 00000: tallied 0 elections
>> d= p=392 a=LOCATION: bdr_sequencer_tally, bdr_seq.c:755
>> d= p=392 a=DEBUG: 00000: checked 0 sequences for filling
>> d= p=392 a=LOCATION: bdr_sequencer_fill_sequences, bdr_seq.c:1034
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 08006:
>> unexpected EOF on client connection with an open transaction
>> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION:
>> SocketBackend, postgres.c:344
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 08006:
>> unexpected EOF on client connection with an open transaction
>> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION:
>> SocketBackend, postgres.c:344
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000:
>> searching for logical decoding starting point, starting at 0/18ECA60
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> DecodingContextFindStartpoint, logical.c:469
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
>> logical decoding found initial starting point at 0/18ECA60
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: 2
>> transactions need to finish.
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> SnapBuildFindSnapshot, snapbuild.c:1338
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
>> logical decoding found consistent point at 0/18ECAD0
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL:
>> Transaction ID 1823 finished; no more running transactions.
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> SnapBuildEndTxn, snapbuild.c:907
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000:
>> exported logical decoding snapshot: "00000721-1" with 0 transaction IDs
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> SnapBuildExportSnapshot, snapbuild.c:604
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 08006:
>> unexpected EOF on client connection with an open transaction
>> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION:
>> SocketBackend, postgres.c:344
>>
>
>
>
> --
> ~Wayne
>
> Wayne E. Seguin
> wayneeseguin(at)gmail(dot)com
> wayneeseguin on irc.freenode.net
> http://twitter.com/wayneeseguin/
> https://github.com/wayneeseguin/
>

--
~Wayne

Wayne E. Seguin
wayneeseguin(at)gmail(dot)com
wayneeseguin on irc.freenode.net
http://twitter.com/wayneeseguin/
https://github.com/wayneeseguin/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter Mogensen 2015-05-12 07:40:16 Re: Comparing txid and xmin (under BDR)
Previous Message Wayne E. Seguin 2015-05-12 06:36:09 Re: [BDR] Node Join Question