From: | Scott Whitney <scott(at)journyx(dot)com> |
---|---|
To: | Scott Whitney <swhitney(at)journyx(dot)com> |
Cc: | postgres admin <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Failing replication |
Date: | 2014-06-03 00:35:43 |
Message-ID: | 196857337.1802139.1401755743114.JavaMail.zimbra@mail.int.journyx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
rsync has completed. Cascading (#3 -> #1 via #2) is up and running again.
Log as of now:
LOG: consistent recovery state reached at 31C/1C979FD8
LOG: database system is ready to accept read only connections
LOG: record with zero length at 31C/1CCDF680
LOG: streaming replication successfully connected to primary
=== me....it's working 7:29 pm
=== scott again...restarting pg -> db01 at 6/2/14 7:33 just to see what happens.
LOG: database system was shut down in recovery at 2014-06-02 19:34:07 CDT
LOG: entering standby mode
LOG: redo starts at 31C/1F443578
LOG: consistent recovery state reached at 31C/20278980
LOG: record with incorrect prev-link 319/60278768 at 31C/20278980
LOG: database system is ready to accept read only connections
LOG: streaming replication successfully connected to primary
----- Original Message -----
> So, I have 3 servers:
> 1) Production.
> 2) "Production" training server. Replicates #1. At same data center. 2
> postmasters on separate ports.
> 3) Internal "production" server. 3 postmasters on separate ports. Replicates
> #2 and #1 (both from #2).
> This has happened to me several times in the past few weeks.
> From #3:
> LOG: entering standby mode
> LOG: redo starts at 31B/FDFFC0D8
> WARNING: page 7843 of relation base/25999670/33421425 is uninitialized
> PANIC: WAL contains references to invalid pages
> LOG: startup process (PID 4580) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
> ===
> here is me restarting...adding this since there are no timestamps in this
> log...
> ===
> LOG: database system was interrupted while in recovery at log time 2014-06-02
> 17:31:05 CDT
> HINT: If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG: entering standby mode
> LOG: redo starts at 31B/FDFFC0D8
> WARNING: page 7843 of relation base/25999670/33421425 is uninitialized
> PANIC: WAL contains references to invalid pages
> LOG: startup process (PID 4838) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
> There is no activity in #2's pg_log postgresql-Mon.log file consummate to
> this event.
> I have wal segments keeping 1,000. I have pg_xlog files dated yesterday.
> This happened earlier today as well. It happened twice last week.
> The "warning page" stuff is different from the pg_controldata stuff earlier
> today.
> Here is pg_controldata from server 3 wrt server 1 replication.
> pg_control version number: 922
> Catalog version number: 201204301
> Database system identifier: 5868266298097360799
> Database cluster state: in archive recovery
> pg_control last modified: Mon Jun 2 18:42:18 2014
> Latest checkpoint location: 31B/FE08DB50
> Prior checkpoint location: 31B/FE08DB50
> Latest checkpoint's REDO location: 31B/FDFFC0D8
> Latest checkpoint's TimeLineID: 1
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID: 0/441709396
> Latest checkpoint's NextOID: 33432042
> Latest checkpoint's NextMultiXactId: 28077
> Latest checkpoint's NextMultiOffset: 57507
> Latest checkpoint's oldestXID: 241800232
> Latest checkpoint's oldestXID's DB: 16573
> Latest checkpoint's oldestActiveXID: 441709396
> Time of latest checkpoint: Mon Jun 2 17:31:05 2014
> Minimum recovery ending location: 31C/4E1C18
> Backup start location: 0/0
> Backup end location: 0/0
> End-of-backup record required: no
> Current wal_level setting: hot_standby
> Current max_connections setting: 4000
> Current max_prepared_xacts setting: 0
> Current max_locks_per_xact setting: 64
> Maximum data alignment: 8
> Database block size: 8192
> Blocks per segment of large relation: 131072
> WAL block size: 8192
> Bytes per WAL segment: 16777216
> Maximum length of identifiers: 64
> Maximum columns in an index: 32
> Maximum size of a TOAST chunk: 1996
> Date/time type storage: 64-bit integers
> Float4 argument passing: by value
> Float8 argument passing: by value
> I have to redo the base backup and bring it back up again, so the rest of the
> stuff is about to be rsycnhed awak, but it has now failed twice today. It
> might well, again.
> What more should I provide to try to find a solution?
> Server 2 -> 1 replication has no issues.
> Server 3 -> 2 (replicating server 1) is the one that's failing.
> 65Mbps line between them.
> PG 9.28 across the board on Centos 6.5 64bit all across.
From | Date | Subject | |
---|---|---|---|
Next Message | Gabriele Lohss | 2014-06-04 10:29:57 | Re: Oracle Postgres Conversion |
Previous Message | Scott Whitney | 2014-06-02 23:49:11 | Failing replication |