Re: Standby Server Bus 7 error

From: Fabio Milano <fmilano(at)crdsgroup(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Standby Server Bus 7 error
Date: 2014-07-24 20:03:42
Message-ID: b1af9bd761734ab28801eb59329851f2@DM2PR07MB624.namprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Fabio Milano
Sent: Thursday, July 24, 2014 12:40 PM
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Standby Server Bus 7 error

-----Original Message-----
From: Michael Paquier [mailto:michael(dot)paquier(at)gmail(dot)com]
Sent: Wednesday, July 23, 2014 10:25 PM
To: Fabio Milano
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano(at)crdsgroup(dot)com> wrote:
> Any assistance in interpreting the logs is much appreciated.
> Replication server crashes. Below is snippet from log.
> 2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr imary at
> 12/B0000000 on timeline 1
> 2014-07-22 23:43:12 EDT FATAL: could not receive data from WAL
> stream: server closed the connection unexpectedly
> This probably means the server terminated a bnormally
> before or while processing the request.
The replication stream has finished. Did you root server crash in some way?

> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D0': No such file or directory
> 2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
> 00 in log segment 0000000100000012000000D0, offset 0
This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master?

> 2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
> 000012000000D0" from archive
> cp: cannot stat '/var/lib/postgresql/archive/00000001000000
> 12000000D1': No such file or directory
> 2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
> 00 in log segment 0000000100000012000000D1, offset 0
Same here.

> Upon trying a /etc/init.d/postgresql restart
> 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D"
> from archive
> 2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at
> 1B/1DFC64C0
> 2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only
> connections
> 2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E"
> from archive
> 2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by
> signal 7: Bus error
This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

Thank you Michael,

No report of crashing on the Master.

>> This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data.

Could this be caused by something happening by the developers on the Master?

The Archive command on the master is:
'rsync -az %p postgres(at)[ipaddress]:archive/%f'

>> This may find its root to a hardware problem...
The standby server is a VPS. Previously 9.0.13 replication was working .
9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed Master new install OS Ubuntu 14.04 and new postgres 9.3.4 install

Version is:

psql (PostgreSQL) 9.3.4

Performed a manual resync yesterday however another crash this morning.

Standby - /etc/init.d/postgresql stop
Master - psql -c "select pg_start_backup('label', true)"
Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip address]:/var/lib/postgresql/9.3/main/
Master - psql -c "SELECT pg_stop_backup()"
Standby - /etc/init.d/postgresql start

Another crash.

2014-07-24 09:58:31 EDT LOG: restored log file "000000010000003300000096" from archive
cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory
2014-07-24 09:58:31 EDT LOG: unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0
2014-07-24 09:58:32 EDT LOG: started streaming WAL from primary at 33/97000000 on timeline 1
2014-07-24 10:03:50 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory
2014-07-24 10:03:50 EDT LOG: unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0
2014-07-24 10:04:06 EDT LOG: restored log file "0000000100000033000000B7" from archive
cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory
2014-07-24 10:04:06 EDT LOG: unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0
2014-07-24 10:04:06 EDT LOG: started streaming WAL from primary at 33/B8000000 on timeline 1
2014-07-24 10:06:24 EDT LOG: startup process (PID 18905) was terminated by signal 7: Bus error
2014-07-24 10:06:24 EDT LOG: terminating any other active server processes

/etc/init.d/postgresql restart

2014-07-24 15:19:55 EDT LOG: restored log file "000000010000003400000009" from archive
2014-07-24 15:19:57 EDT LOG: startup process (PID 22458) was terminated by signal 7: Bus error
2014-07-24 15:19:57 EDT LOG: terminating any other active server processes

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

>> Confirmed with Developers:

>> We haven't done any rollbacks, and haven't seen any crashes.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2014-07-24 20:22:12 Re: Table checksum proposal
Previous Message Fabio Milano 2014-07-24 19:40:07 Re: Standby Server Bus 7 error