Re: pg_upgrade resets timeline to 1

From: Noah Misch <noah(at)leadboat(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Marco Nenciarini <mnencia(at)debian(dot)org>
Subject: Re: pg_upgrade resets timeline to 1
Date: 2015-05-28 07:27:21
Message-ID: 20150528072721.GA4102649@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 27, 2015 at 05:40:09PM +0200, Christoph Berg wrote:
> commit 4c5e060049a3714dd27b7f4732fe922090edea69
> Author: Bruce Momjian <bruce(at)momjian(dot)us>
> Date: Sat May 16 00:40:18 2015 -0400
>
> pg_upgrade: force timeline 1 in the new cluster
>
> Previously, this prevented promoted standby servers from being upgraded
> because of a missing WAL history file. (Timeline 1 doesn't need a
> history file, and we don't copy WAL files anyway.)
>
> Pardon me for starting a fresh thread, but I couldn't find where this
> was discussed.
>
> I've just had trouble getting barman to work again after a 9.1->9.4.2
> upgrade, and I think part of the problem was that the WAL for this
> cluster got reset from timeline 2 to 1, which made barman's incoming
> WALs processor drop the files, probably because the new filename
> 0001... is now "less" than the 0002... before.

It looks like an upgrade from 9.1.x to 9.3.0 or later has always set the new
timeline identifier (TLI) to 1. My testing confirms this for an upgrade from
9.1.16 to 9.4.1 and for an upgrade from 9.1.16 to 9.4.2, so I failed to
reproduce your report. Would you verify the versions you used? If you were
upgrading from 9.3.x, I _can_ reproduce that.

Since the 2015-05-16 commits you cite, pg_upgrade always sets TLI=1. Behavior
before those commits depended on the source and destination major versions.
PostgreSQL 9.0, 9.1 and 9.2 restored the TLI regardless of source version.
PostgreSQL 9.3 and 9.4 restored the TLI when upgrading from 9.3 or 9.4, but
they set TLI=1 when upgrading from 9.2 or earlier. (Commit 038f3a0 introduced
this inconsistent behavior of 9.3 and later.)

The commit you cite fixed this symptom:
http://www.postgresql.org/message-id/flat/D5359E0908278642BB1747131D62694DAB22560F(at)AUSMXMBX01(dot)mrws(dot)biz

I'm attaching a test script that I used to observe TLI assignment and to test
for that problem. pg_upgrade has been restoring TLI without history files
since 9.0.0 or earlier, and that was always risky. The reported symptom
became possible with the introduction of the TIMELINE_HISTORY walsender
command in 9.3.0. (It was hard to encounter before 9.4, because 9.3 to 9.3
pg_upgrade runs are rare outside of hacker testing.)

Since you observed barman breakage less than a week after a release that
changed the post-pg_upgrade TLI, it seems prudent to figure that other folks
will be affected. At the same time, I don't understand why that release would
prompt the first report. Any upgrade from {9.0,9.1,9.2} to {9.3,9.4} already
had the behavior you experienced. Ideas?

> I don't expect to be able to recover through a pg_upgrade operation,
> but pg_upgrade shouldn't make things more complicated than they should
> be for backup tools. (If there's a problem with the history files,
> shouldn't pg_upgrade copy them instead?)
>
> In fact, I'm wondering if pg_upgrade shouldn't rather *increase* the
> timeline to make sure the archive_command doesn't clobber any files
> from the old cluster when reused in the new cluster?

It's worth considering that, as a major-release change. Do note this in the
documentation, though:

The archive command should generally be designed to refuse to overwrite any
pre-existing archive file. This is an important safety feature to preserve
the integrity of your archive in case of administrator error (such as
sending the output of two different servers to the same archive directory).
-- http://www.postgresql.org/docs/devel/static/continuous-archiving.html

Attachment Content-Type Size
upgrade-timeline.sh application/x-sh 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-05-28 07:40:33 Re: [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
Previous Message Nivedita Kulkarni 2015-05-28 07:14:14 [Postgresql NLS support] : Help on using NLS , Custom dictionary to enhance our website search functionality