Re: pg_rewind in contrib

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: pg_rewind in contrib
Date: 2014-12-16 09:37:33
Message-ID: 548FFD5D.80703@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/16/2014 11:23 AM, Satoshi Nagayasu wrote:
> Hi,
>
> On 2014/12/12 23:13, Heikki Linnakangas wrote:
> > Hi,
> >
> > I'd like to include pg_rewind in contrib. I originally wrote it as an
> > external project so that I could quickly get it working with the
> > existing versions, and because I didn't feel it was quite ready for
> > production use yet. Now, with the WAL format changes in master, it is a
> > lot more maintainable than before. Many bugs have been fixed since the
> > first prototypes, and I think it's fairly robust now.
> >
> > I propose that we include pg_rewind in contrib/ now. Attached is a patch
> > for that. It just includes the latest sources from the current pg_rewind
> > repository at https://github.com/vmware/pg_rewind. It is released under
> > the PostgreSQL license.
> >
> > For those who are not familiar with pg_rewind, it's a tool that allows
> > repurposing an old master server as a new standby server, after
> > promotion, even if the old master was not shut down cleanly. That's a
> > very often requested feature.
>
> I'm looking into pg_rewind with a very first scenario.
> My scenario is here.
>
> https://github.com/snaga/pg_rewind_test/blob/master/pg_rewind_test.sh
>
> At least, I think a file descriptor "srcf" should be closed before
> exiting copy_file_range(). I got "can't open file" error with
> "too many open file" while running pg_rewind.
>
> ------------------------------------------------
> diff --git a/contrib/pg_rewind/copy_fetch.c b/contrib/pg_rewind/copy_fetch.c
> index bea1b09..5a8cc8e 100644
> --- a/contrib/pg_rewind/copy_fetch.c
> +++ b/contrib/pg_rewind/copy_fetch.c
> @@ -280,6 +280,8 @@ copy_file_range(const char *path, off_t begin, off_t
> end, bool trunc)
> write_file_range(buf, begin, readlen);
> begin += readlen;
> }
> +
> + close(srcfd);
> }
>
> /*
> ------------------------------------------------

Yep, good catch. I pushed a fix to the pg_rewind repository at github.

> And I have one question here.
>
> pg_rewind assumes that the source PostgreSQL has, at least, one
> checkpoint after getting promoted. I think the target timeline id
> in the pg_control file to be read is only available after the first
> checkpoint. Right?

Yes, it does assume that the source server (= old standby, new master)
has had at least one checkpoint after promotion. It probably should be
more explicit about it: If there hasn't been a checkpoint, you will
currently get an error "source and target cluster are both on the same
timeline", which isn't very informative.

I assume that by "target timeline ID" you meant the timeline ID of the
source server, i.e. the timeline that the target server should be
rewound to.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Timmer, Marius 2014-12-16 09:52:13 Re: [PATCH] explain sortorder
Previous Message Mark Cave-Ayland 2014-12-16 09:36:12 Re: Commitfest problems