From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | pgsql-patches(at)postgresql(dot)org |
Subject: | Re: Verified fix for Bug 4137 |
Date: | 2008-05-06 11:23:03 |
Message-ID: | 1210072983.4435.263.camel@ebony.site |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
On Tue, 2008-05-06 at 12:02 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > The problem was that at the very start of archive recovery the %r
> > parameter in restore_command could be set to a filename later than the
> > currently requested filename (%f). This could lead to early truncation
> > of the archived WAL files and would cause warm standby replication to
> > fail soon afterwards, in certain specific circumstances.
> >
> > Fix applied to both core server in generating correct %r filenames and
> > also to pg_standby to prevent acceptance of out-of-sequence filenames.
>
> So the core problem is that we use ControlFile->checkPointCopy.redo in
> RestoreArchivedFile to determine the safe truncation point, but when
> there's a backup label file, that's still coming from pg_control file,
> which is wrong.
>
> The patch fixes that by determining the safe truncation point as
> Min(checkPointCopy.redo, xlogfname), where xlogfname is the xlog file
> being restored. That depends on the assumption that everything before
> the first file we (try to) restore is safe to truncate. IOW, we never
> try to restore file B first, and then A, where A < B.
>
> I'm not totally convinced that's a safe assumption. As an example,
> consider doing an archive recovery, but without a backup label, and the
> latest checkpoint record is broken. We would try to read the latest
> (broken) checkpoint record first, and call RestoreArchivedFile to get
> the xlog file containing that. But because that record is broken, we
> fall back to using the previous checkpoint, and will need the xlog file
> where the previous checkpoint record is in.
>
> That's a pretty contrived example, but the point is that assumption
> feels fragile. At the very least it should be noted explicitly in the
> comments. A less fragile approach would be to use something dummy, like
> "000000000000000000000000" as the truncation point until we've
> successfully read the checkpoint/restartpoint record and started the replay.
Your comments are interesting and I'll think through that some more to
see if I can see if that leads to bugs. Will talk more later.
The only valid starting place, in all cases, is the checkpoint written
by pg_start_backup(). The user doesn't need to have been archiving files
prior to that so could legitimately have had a null archive_command.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-05-06 13:44:04 | Re: Verified fix for Bug 4137 |
Previous Message | Heikki Linnakangas | 2008-05-06 11:02:56 | Re: Verified fix for Bug 4137 |