From: | Daniel Farina <daniel(at)heroku(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | hot backups: am I doing it wrong, or do we have a problem with pg_clog? |
Date: | 2011-04-21 11:15:48 |
Message-ID: | BANLkTi=j-k3QOFKpjxUG5m0FtihANz3tOw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
To start at the end of this story: "DETAIL: Could not read from file
"pg_clog/007D" at offset 65536: Success."
This is a message we received on a a standby that we were bringing
online as part of a test. The clog file was present, but apparently
too small for Postgres (or at least I tihnk this is what the message
meant), so one could stub in another clog file and then continue
recovery successfully (modulus the voodoo of stubbing in clog files in
general). I am unsure if this is due to an interesting race condition
in Postgres or a result of my somewhat-interesting hot-backup
protocol, which is slightly more involved than the norm. I will
describe what it does here:
1) Call pg start backup
2) crawl the entire postgres cluster directory structure, except
pg_xlog, taking notes of the size of every file present
3) begin writing TAR files, but *only up to the size noted during the
original crawling of the cluster directory,* so if the file grows
between the original snapshot and subsequently actually calling read()
on the file those extra bytes will not be added to the TAR.
3a) If a file is truncated partially, I add "\0" bytes to pad the
tarfile member up to the size sampled in step 2, as I am streaming the
tar file and cannot go back in the stream and adjust the tarfile
member size
4) call pg stop backup
The reason I go to this trouble is because I use many completely
disjoint tar files to do parallel compression, decompression,
uploading, and downloading of the base backup of the database, and I
want to be able to control the size of these files up-front. The
requirement of stubbing in \0 is because of a limitation of the tar
format when dealing with streaming archives and the requirement to
truncate the files to the size snapshotted in the step 2 is to enable
splitting up the files between volumes even in the presence of
possible concurrent growth while I'm performing the hot backup. (ex: a
handful of nearly-empty heap files can rapidly grow due to a
concurrent bulk load if I get unlucky, which I do not intend to allow
myself to be).
Any ideas? Or does it sound like I'm making some bookkeeping errors
and should review my code again? It does work most of the time. I
have not gotten a sense how often this reproduces just yet.
--
fdr
From | Date | Subject | |
---|---|---|---|
Next Message | Nick Raj | 2011-04-21 12:12:26 | Defining input function for new datatype |
Previous Message | Simon Riggs | 2011-04-21 10:38:27 | Re: Re: database system identifier differs between the primary and standby |