Re: PITR Backups

From: Dan Gorman <dgorman(at)hi5(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>, "Toru SHIMOGAKI" <shimogaki(dot)toru(at)oss(dot)ntt(dot)co(dot)jp>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PITR Backups
Date: 2007-06-25 15:28:51
Message-ID: D9E96F55-3AF6-4DC7-A68A-251567976E88@hi5.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I took several snapshots. In all cases the FS was fine. In one case
the db looked like on recovery it thought there were outstanding
pages to be written to disk as seen below and the db wouldn't start.

Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
00:39:43 PDTLOG: redo done at 71/99870670
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [11-1] 2007-06-21
00:39:43 PDTWARNING: page 13626 of relation 1663/16384/76716 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [12-1] 2007-06-21
00:39:43 PDTWARNING: page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [13-1] 2007-06-21
00:39:43 PDTWARNING: page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [14-1] 2007-06-21
00:39:43 PDTWARNING: page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [15-1] 2007-06-21
00:39:43 PDTWARNING: page 3535208 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [16-1] 2007-06-21
00:39:43 PDTWARNING: page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [17-1] 2007-06-21
00:39:43 PDTWARNING: page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [18-1] 2007-06-21
00:39:43 PDTWARNING: page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [19-1] 2007-06-21
00:39:43 PDTWARNING: page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [20-1] 2007-06-21
00:39:43 PDTWARNING: page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [21-1] 2007-06-21
00:39:43 PDTWARNING: page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [22-1] 2007-06-21
00:39:43 PDTWARNING: page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [23-1] 2007-06-21
00:39:43 PDTWARNING: page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [24-1] 2007-06-21
00:39:43 PDTWARNING: page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [25-1] 2007-06-21
00:39:43 PDTWARNING: page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [26-1] 2007-06-21
00:39:43 PDTWARNING: page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [27-1] 2007-06-21
00:39:43 PDTWARNING: page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [28-1] 2007-06-21
00:39:43 PDTWARNING: page 3535209 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [29-1] 2007-06-21
00:39:43 PDTWARNING: page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [30-1] 2007-06-21
00:39:43 PDTWARNING: page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [31-1] 2007-06-21
00:39:43 PDTWARNING: page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [32-1] 2007-06-21
00:39:43 PDTWARNING: page 17306 of relation 1663/16384/76710 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [33-1] 2007-06-21
00:39:43 PDTWARNING: page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [34-1] 2007-06-21
00:39:43 PDTWARNING: page 800226 of relation 1663/16384/33204 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [35-1] 2007-06-21
00:39:43 PDTWARNING: page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [36-1] 2007-06-21
00:39:43 PDTWARNING: page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [37-1] 2007-06-21
00:39:43 PDTWARNING: page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [38-1] 2007-06-21
00:39:43 PDTWARNING: page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [39-1] 2007-06-21
00:39:43 PDTWARNING: page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
00:39:43 PDTPANIC: WAL contains references to invalid pages
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [1-1] 2007-06-21
00:39:43 PDTLOG: startup process (PID 3506) was terminated by signal 6
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [2-1] 2007-06-21
00:39:43 PDTLOG: aborting startup due to startup process failure
Jun 21 00:39:43 sfmedstorageha001 postgres[3505]: [1-1] 2007-06-21
00:39:43 PDTLOG: logger shutting down
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-1] 2007-06-21
00:40:39 PDTLOG: database system was interrupted while in recovery
at 2007-06-21 00:36:40 PDT
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-2] 2007-06-21
00:40:39 PDTHINT: This probably means that some data is corrupted
and you will have to use the last backup for
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-3] recovery.
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [2-1] 2007-06-21
00:40:39 PDTLOG: checkpoint record is at 71/9881E928
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [3-1] 2007-06-21
00:40:39 PDTLOG: redo record is at 71/986BF148; undo record is at
0/0; shutdown FALSE
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [4-1] 2007-06-21
00:40:39 PDTLOG: next transaction ID: 0/2871389429; next OID: 83795
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [5-1] 2007-06-21
00:40:39 PDTLOG: next MultiXactId: 1; next MultiXactOffset: 0
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [6-1] 2007-06-21
00:40:39 PDTLOG: database system was not properly shut down;
automatic recovery in progress
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [7-1] 2007-06-21
00:40:39 PDTLOG: redo starts at 71/986BF148
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [8-1] 2007-06-21
00:40:39 PDTLOG: record with zero length at 71/998706A8
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [9-1] 2007-06-21
00:40:39 PDTLOG: redo done at 71/99870670
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [10-1] 2007-06-21
00:40:39 PDTWARNING: page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [11-1] 2007-06-21
00:40:39 PDTWARNING: page 13626 of relation 1663/16384/76716 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [12-1] 2007-06-21
00:40:39 PDTWARNING: page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [13-1] 2007-06-21
00:40:39 PDTWARNING: page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [14-1] 2007-06-21
00:40:39 PDTWARNING: page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [15-1] 2007-06-21
00:40:39 PDTWARNING: page 3535208 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [16-1] 2007-06-21
00:40:39 PDTWARNING: page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [17-1] 2007-06-21
00:40:39 PDTWARNING: page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [18-1] 2007-06-21
00:40:39 PDTWARNING: page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [19-1] 2007-06-21
00:40:39 PDTWARNING: page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [20-1] 2007-06-21
00:40:39 PDTWARNING: page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [21-1] 2007-06-21
00:40:39 PDTWARNING: page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [22-1] 2007-06-21
00:40:39 PDTWARNING: page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [23-1] 2007-06-21
00:40:39 PDTWARNING: page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [24-1] 2007-06-21
00:40:39 PDTWARNING: page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [25-1] 2007-06-21
00:40:39 PDTWARNING: page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [26-1] 2007-06-21
00:40:39 PDTWARNING: page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [27-1] 2007-06-21
00:40:39 PDTWARNING: page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [28-1] 2007-06-21
00:40:39 PDTWARNING: page 3535209 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [29-1] 2007-06-21
00:40:39 PDTWARNING: page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [30-1] 2007-06-21
00:40:39 PDTWARNING: page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [31-1] 2007-06-21
00:40:39 PDTWARNING: page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [32-1] 2007-06-21
00:40:39 PDTWARNING: page 17306 of relation 1663/16384/76710 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [33-1] 2007-06-21
00:40:39 PDTWARNING: page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [34-1] 2007-06-21
00:40:39 PDTWARNING: page 800226 of relation 1663/16384/33204 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [35-1] 2007-06-21
00:40:39 PDTWARNING: page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [36-1] 2007-06-21
00:40:39 PDTWARNING: page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [37-1] 2007-06-21
00:40:39 PDTWARNING: page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [38-1] 2007-06-21
00:40:39 PDTWARNING: page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [39-1] 2007-06-21
00:40:39 PDTWARNING: page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [40-1] 2007-06-21
00:40:39 PDTPANIC: WAL contains references to invalid pages
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [1-1] 2007-06-21
00:40:39 PDTLOG: startup process (PID 3757) was terminated by signal 6
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [2-1] 2007-06-21
00:40:39 PDTLOG: aborting startup due to startup process failure
Jun 21 00:40:39 sfmedstorageha001 postgres[3756]: [1-1] 2007-06-21
00:40:39 PDTLOG: logger shutting down

On Jun 25, 2007, at 6:26 AM, Simon Riggs wrote:

> On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote:
>
>> Year, I agree we should carefully follow how Done really did a
>> backup.
>
>> My point is PostgreSQL may have to extend the file during the hot
>> backup
>> to write to the new block.
>
> If the snapshot is a consistent, point-in-time copy then I don't
> see how
> any I/O at all makes a difference. To my knowledge, both EMC and
> NetApp
> produce snapshots like this. IIRC, EMC calls these instant snapshots,
> NetApp calls them frozen snapshots.
>
>> It is slightly different from Oracle's case.
>> Oracle allocates all the database space in advance so that there
>> could
>> be no risk to modify the metadata on the fly.
>
> Not really sure its different.
>
> Oracle allows dynamic file extensions and I've got no evidence that
> file
> extension is prevented from occurring during backup simply as a result
> of issuing the start hot backup command.
>
> Oracle and DB2 both support a stop-I/O-to-the-database mode. My
> understanding is that isn't required any more if you do an instant
> snapshot, so if people are using instant snapshots it should certainly
> be the case that they are safe to do this with PostgreSQL also.
>
> Oracle is certainly more picky about snapshotted files than PostgreSQL
> is. In Oracle, each file has a header with the LSN of the last
> checkpoint in it. This is used at recovery time to ensure the
> backup is
> consistent by having exactly equal LSNs across all files. PostgreSQL
> doesn't use file headers and we don't store the LSN on a per-file
> basis,
> though we do store the LSN in the control file for the whole server.
>
>> In our case, because SAN
>> based storage snapshot is device level, not file system level, even a
>> file system does not know that the snapshot is being taken and we
>> might
>> encounter the case where metadata and/or user data are not
>> consistent.
>> Such snapshot (whole filesystem) might be corrupted and cause file
>> system level error.
>>
>> I'm interested in this. Any further comment/openion is welcome.
>
> If you can show me either
>
> i) an error that occurs after the full and correct PostgreSQL hot
> backup
> procedures have been executed, or
>
> ii) present a conjecture that explains in detail how a device level
> error might occur
>
> then I will look into this further.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2007-06-25 16:02:53 Re: PITR Backups
Previous Message Dan Gorman 2007-06-25 15:26:52 Re: PITR Backups