Re: upgrade to repmgr3

From: Pekka Rinne <tsierkkis(at)gmail(dot)com>
To: Ian Barwick <ian(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: upgrade to repmgr3
Date: 2016-08-08 08:57:07
Message-ID: CACAG2VWWkuZr24LvwWrJEhYS4S72U397+eVn6m8Q2NKr8_OqAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

hello

Thanks for your comments. They are very helpful. If you have any draft
level documentation available of the upgrade procedure I would be very
interested in seeing it and maybe trying it out as well.

Meanwhile I did some more testing with my environment using repmgr3 and
noticed an issue with promoting standby node. Here is roughly what I did.

1. Install repmgr3.1.2 RPM to all nodes as upgrade to previous 2.0.2.
2. I took repmgr release upgrade SQL-scripts from github 3.1 stable and ran
those on master (all three scripts in order).
3. on master I stopped postgresql service
4. on standby I said standby promote which does some things and then hangs
forever.

This standby promote was working fine before repmgr upgrade.

There is a COMMIT command visible with ps:

3324 ? Ss 0:00 postgres: repmgr repmgr <new master IP>(43666)
COMMIT waiting for 2/4E000548

What I see in the console is here:

[2016-08-08 10:29:03] [NOTICE] using configuration file
"/var/lib/pgsql/repmgr/repmgr.conf"
[2016-08-08 10:29:03] [INFO] connecting to standby database
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP>
user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [INFO] connected to standby, checking its state
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT
pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [INFO] retrieving node list for cluster 'sensorlog'
[2016-08-08 10:29:03] [DEBUG] get_master_connection():
SELECT id, conninfo, CASE WHEN type = 'master' THEN 1 ELSE 2 END
AS type_priority FROM "repmgr_sensorlog".repl_nodes WHERE cluster =
'sensorlog' AND type != 'witness' ORDER BY active DESC, type_priority,
priority, id
[2016-08-08 10:29:03] [INFO] checking role of cluster node '1'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP>
user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [ERROR] connection to database failed: could not
connect to server: Connection refused
Is the server running on host "<old master IP>" and accepting
TCP/IP connections on port 5432?
[2016-08-08 10:29:03] [INFO] checking role of cluster node '2'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP>
user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT
pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [INFO] checking role of cluster node '3'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP>
user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT
pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [NOTICE] promoting standby
[2016-08-08 10:29:03] [DEBUG] get_pg_setting(): SELECT name, setting FROM
pg_settings WHERE name = 'data_directory'
[2016-08-08 10:29:03] [DEBUG] get_pg_setting(): returned value is
"/var/lib/pgsql/data"
[2016-08-08 10:29:03] [NOTICE] promoting server using '/usr/bin/pg_ctl -D
/var/lib/pgsql/data promote'
server promoting
[2016-08-08 10:29:03] [INFO] reconnecting to promoted server
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP>
user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT
pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:05] [DEBUG] is_standby(): SELECT
pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:05] [DEBUG] setting node 2 as master and marking existing
master as failed
[2016-08-08 10:29:05] [DEBUG] begin_transaction()
[2016-08-08 10:29:05] [DEBUG] commit_transaction()

The system is left in a strange state after this. If I start postgresql
again in old master node and issue cluster show it lists both nodes as
masters.

In this experiment I did not take slot based replication into use at all.
Everything is left just as it was before except the repmgr upgrade.

br,
Pekka

2016-08-04 16:19 GMT+03:00 Ian Barwick <ian(at)2ndquadrant(dot)com>:

> Hi
>
> On 08/04/2016 05:57 PM, Pekka Rinne wrote:
>
>> hi!
>>
>> I have been using postgres 9.4 and repmgr2.0 combination and been doing
>> replication (hot standby). Now I'd like to start doing slot based
>> replication and have installed repmgr3 and exeuted the provided sql
>> scripts
>> and also added use_replication_slots=1 into repmgr.conf.
>>
>> The question is that what is the correct procedure to switch into using
>> slots (max_replication_slots) in this case as the system has been set up
>> already without them? Do I have to unregister and re-register all the
>> standbys? Hopefully re-clone could be avoided.
>>
>
> No reclone needed.
>
> What I tried was that I configured max_replication_hosts=5, restarted
>> master, created some slots using select * from
>> pg_create_physical_replication_slot(<name>), configured one created slot
>> into recovery.conf in the slave. What I noticed was that replication
>> seemed
>> to be still working after this but in repl_nodes table slot_name remained
>> empty. Then I did standby re-register with force and slot_name was filled
>> with repmgr_slot_2 value which is not the name I gave for the slot. I
>> think
>> repmgr invented this name but in the pg_replication_slots table
>> repmgr_slot_2 does not exist. There is only the slot I created myself
>> (active=t). So I guess this approach is not quite right.
>>
>> What if I just skip doing the re-register. Does is matter if slot_name
>> remains empty in repl_nodes?
>>
>
> This won't affect replication, however if you attempt any failover actions
> using repmgr (e.g. following a new master), it will probably cause
> problems when
> attempting to create a replication slot on the new master.
>
> As a workaround you can manually update the slot name in the repl_nodes
> table
> to match the one you've chosen. We'll update repmgr to better handle this
> kind of situation. I don't think we've had this particular use-case before,
> so I'll add some notes to the documentation on how best to handle it.
>
>
> Regards
>
> Ian Barwick
>
> --
> Ian Barwick http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-08-08 13:51:13 Re: lower() silently fails for 9.5.3 on Windows, but works on Mac, Linux
Previous Message Charles Clavadetscher 2016-08-08 08:27:12 Re: lower() silently fails for 9.5.3 on Windows, but works on Mac, Linux