Re: BUG #7753: Cannot promote out of hot standby

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: daniel(at)heroku(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7753: Cannot promote out of hot standby
Date: 2013-03-26 11:08:27
Message-ID: 515181AB.8090302@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

(cleaning up my inbox..)

Did you ever figure out this one?

On 12.12.2012 23:36, daniel(at)heroku(dot)com wrote:
> The following bug has been logged on the website:
>
> Bug reference: 7753
> Logged by: Daniel Farina
> Email address: daniel(at)heroku(dot)com
> PostgreSQL version: 9.1.6
> Operating system: Ubuntu 10.04
> Description:
>
> Touching a trigger file will not cause promotion out of hot standby.
> Basically, an apparently normally-working hot-standby database will
> not leave hot standby. The database emitting WAL is version 9.1.4.
>
> Everything appears normal in the log (downloads and restoring of
> archived segments), and the server seems to take no notice of the
> trigger file.
>
> To force the issue, I introduced an error into the configuration of
> the restoration program to cause it to exit. Normally that's no
> problem; postgres would just keep on trying to restore a segment over
> and over until the error is fixed.
>
> Instead, the server crashes:
>
> [413-1] [COPPER] LOG: restored log file "000000010000034D00000050" from
> archive
> wal_e.worker.s3_worker INFO MSG: completed download and
> decompression#012 DETAIL: Downloaded and decompressed
> "s3://archive-root/wal_005/000000010000034D00000051.lzo" to
> "pg_xlog/RECOVERYXLOG"
> [414-1] [COPPER] LOG: restored log file "000000010000034D00000051" from
> archive
> wal_e.worker.s3_worker INFO MSG: completed download and
> decompression#012 DETAIL: Downloaded and decompressed
> "s3://archive-root/wal_005/000000010000034D00000052.lzo" to
> "pg_xlog/RECOVERYXLOG"
> [415-1] [COPPER] LOG: restored log file "000000010000034D00000052" from
> archive
>
> # I introduce the failure here
>
> wal_e.main ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT:
> Define the environment variable AWS_SECRET_ACCESS_KEY.
> LOG: trigger file found: /etc/postgresql/wal-e.d/pull-env/STANDBY_OFF
> LOG: redo done at 34D/52248590
> LOG: last completed transaction was at log time 2012-12-10 wal_e.main
> ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT: Define the
> environment variable AWS_SECRET_ACCESS_KEY.
> PANIC: could not open file "pg_xlog/000000010000034D00000052" (log file
> 845, segment 82): No such file or directory
> LOG: startup process (PID 7) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
> WARNING: terminating connection because of crash of another server process
> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> DETAIL: The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
>
> I can fix the configuration and restart the server, and everything is
> as fine as before. Next, I try removing recovery.conf and restarting
> the server as an alternative way of promoting...but, no avail;
> however, a slightly different error message:
>
> # Server begins starting
> LOG: loaded library "auto_explain"
> LOG: loaded library "pg_stat_statements"
> LOG: database system was interrupted while in recovery at log time
> 2012-12-10 15:20:03 UTC
> HINT: If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG: could not open file "pg_xlog/000000010000034E0000001A" (log file 846,
> segment 26): No such file or directory
> LOG: invalid primary checkpoint record
> LOG: could not open file "pg_xlog/000000010000034D000000F2" (log file 845,
> segment 242): No such file or directory
> LOG: invalid secondary checkpoint record
> PANIC: could not locate a valid checkpoint record
> LOG: startup process (PID 7) was terminated by signal 6: Aborted
> LOG: aborting startup due to startup process failure
> main process (24284) terminated with status 1
>
> pg_control looks like this around the same time, for reference:
>
> pg_control version number: 903
> Catalog version number: 201105231
> Database cluster state: in archive recovery
> pg_control last modified: Wed 12 Dec 2012 09:22:30 PM UTC
> Latest checkpoint location: 351/1FE194C0
> Prior checkpoint location: 351/FD64A78
> Latest checkpoint's REDO location: 351/131848C8
> Latest checkpoint's TimeLineID: 1
> Latest checkpoint's NextXID: 0/652342033
> Latest checkpoint's NextOID: 103224
> Latest checkpoint's NextMultiXactId: 1
> Latest checkpoint's NextMultiOffset: 0
> Latest checkpoint's oldestXID: 455900714
> Latest checkpoint's oldestXID's DB: 16385
> Latest checkpoint's oldestActiveXID: 652311442
> Time of latest checkpoint: Mon 10 Dec 2012 07:19:23 PM UTC
> Minimum recovery ending location: 351/4BFFFE20
> Backup start location: 0/0
> Current wal_level setting: hot_standby
> Current max_connections setting: 500
> Current max_prepared_xacts setting: 500
> Current max_locks_per_xact setting: 64
> Maximum data alignment: 8
> Database block size: 8192
> Blocks per segment of large relation: 131072
> WAL block size: 8192
> Bytes per WAL segment: 16777216
> Maximum length of identifiers: 64
> Maximum columns in an index: 32
> Maximum size of a TOAST chunk: 1996
> Date/time type storage: 64-bit integers
> Float4 argument passing: by value
> Float8 argument passing: by value
>
> In the course of all this messing around, it has never been a problem to go
> back to archive recovery.
>
>
>

--
- Heikki

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2013-03-26 17:20:51 Re: JRV problem with pg_upgrade
Previous Message webadmin 2013-03-26 09:06:32 BUG #7998: Could not able to connect database