From: | Linas Virbalas <linas(dot)virbalas(at)continuent(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "daniel(at)heroku(dot)com" <daniel(at)heroku(dot)com> |
Subject: | Hot Backup with rsync fails at pg_clog if under load |
Date: | 2011-09-21 14:44:30 |
Message-ID: | CA9FD2FE.1D8D2%linas.virbalas@continuent.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
* Context *
I'm observing problems with provisioning a standby from the master by
following a basic and documented "Making a Base Backup" [1] procedure with
rsync if, in the mean time, heavy load is applied on the master.
After searching the archives, the only more discussed and similar issue I
found hit was by Daniel Farina in a thread "hot backups: am I doing it
wrong, or do we have a problem with pg_clog?" [2], but, it seems, the issue
was discarded because of a non-standard backup procedure Deniel used.
However, I'm observing the same error with a simple procedure, hence this
message.
* Details *
Procedure:
1. Start load generator on the master (WAL archiving enabled).
2. Prepare a Streaming Replication standby (accepting WAL files too):
2.1. pg_switch_xlog() on the master;
2.2. pg_start_backup(Obackup_under_load¹) on the master (this will take a
while as master is loaded up);
2.3. rsync data/global/pg_control to the standby;
2.4. rsync all other data/ (without pg_xlog) to the standby;
2.5. pg_stop_backup() on the master;
2.6. Wait to receive all WAL files, generated during the backup, on the
standby;
2.6. Start the standby PG instance.
The last step will, usually, fail with a similar error:
2011-09-21 13:41:05 CEST LOG: database system was interrupted; last known
up at 2011-09-21 13:40:50 CEST
Restoring 00000014.history
mv: cannot stat `/opt/PostgreSQL/9.1/archive/00000014.history': No such file
or directory
Restoring 00000013.history
2011-09-21 13:41:05 CEST LOG: restored log file "00000013.history" from
archive
2011-09-21 13:41:05 CEST LOG: entering standby mode
Restoring 0000001300000006000000DC
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DC"
from archive
Restoring 0000001300000006000000DB
2011-09-21 13:41:05 CEST LOG: restored log file "0000001300000006000000DB"
from archive
2011-09-21 13:41:05 CEST FATAL: could not access status of transaction
1188673
2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at
offset 32768: Success.
2011-09-21 13:41:05 CEST LOG: startup process (PID 13819) exited with exit
code 1
2011-09-21 13:41:05 CEST LOG: aborting startup due to startup process
failure
The procedure works very reliably if there is little or no load on the
master, but fails very often with the pg_clog error when load generator (few
thousands of SELECTs, ~60 INSERTs, ~60 DELETEs and ~60 UPDATES per second)
is started up.
I assumed that a file system backup taken during pg_start_backup and
pg_stop_backup is guaranteed to be consistent and that missing pieces will
be taken from the WAL files, generated & shipped during the backup, but is
it really?
Is this procedure missing some steps? Or maybe this a known issue?
Thank you,
Linas
[1] http://www.postgresql.org/docs/current/static/continuous-archiving.html
[2] http://archives.postgresql.org/pgsql-hackers/2011-04/msg01132.php
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-09-21 14:50:22 | Re: Inlining comparators as a performance optimisation |
Previous Message | Tom Lane | 2011-09-21 14:41:39 | Re: Range Types - typo + NULL string constructor |