Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
Date: 2011-04-21 16:05:50
Message-ID: BANLkTin03DyWL7MPOK5Kq7m5YX6gnArSgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2011 at 8:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Apr 21, 2011 at 7:15 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> To start at the end of this story: "DETAIL:  Could not read from file
>> "pg_clog/007D" at offset 65536: Success."
>>
>> This is a message we received on a a standby that we were bringing
>> online as part of a test.  The clog file was present, but apparently
>> too small for Postgres (or at least I tihnk this is what the message
>> meant), so one could stub in another clog file and then continue
>> recovery successfully (modulus the voodoo of stubbing in clog files in
>> general).  I am unsure if this is due to an interesting race condition
>> in Postgres or a result of my somewhat-interesting hot-backup
>> protocol, which is slightly more involved than the norm.  I will
>> describe what it does here:
>>
>> 1) Call pg start backup
>> 2) crawl the entire postgres cluster directory structure, except
>> pg_xlog, taking notes of the size of every file present
>> 3) begin writing TAR files, but *only up to the size noted during the
>> original crawling of the cluster directory,* so if the file grows
>> between the original snapshot and subsequently actually calling read()
>> on the file those extra bytes will not be added to the TAR.
>>  3a) If a file is truncated partially, I add "\0" bytes to pad the
>> tarfile member up to the size sampled in step 2, as I am streaming the
>> tar file and cannot go back in the stream and adjust the tarfile
>> member size
>> 4) call pg stop backup
>
> In theory I would expect any defects introduced by the, ahem,
> exciting, procedure described in steps 3 and 3a to be corrected by
> recovery automatically when you start the new cluster.

Neat. This is mostly what I was looking to get out of this thread, I
will start looking for places where I have botched things.

Although some of the frontend interface and some of the mechanism is
embarrassingly rough for several reasons, the other thread posters can
have access to the code if they wish: the code responsible for these
shenangians can be found at https://github.com/heroku/wal-e (and
https://github.com/fdr/wal-e) in the tar_partition.py file.
(https://github.com/heroku/WAL-E/blob/master/wal_e/tar_partition.py)

But I realize that's really too much detail for most people to be
interested in, which is why I didn't post it in the first place. I
think given your assessment I have enough to try to reproduce this
case synthetically (I think taking a very old pg_clog snapshot,
committing a few million xacts while not vacuuming, and then trying to
merge the old clog otherwise newer base backup may prove out the
mechanism I have in mind) or add some more robust logging so I can
catch my (or any, really) problem.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2011-04-21 16:26:03 Re: Re: database system identifier differs between the primary and standby
Previous Message Tom Lane 2011-04-21 15:55:55 Re: fsync reliability