Re: str replication failed, restart fixed it

From: Willy-Bas Loos <willybas(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: str replication failed, restart fixed it
Date: 2014-02-26 12:07:50
Message-ID: CAHnozTg4PAR-JLjwp++CNPDc2sM=VWhQfvD-=K_0XqpE2C1JOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is very probably an OpenVZ issue, it can be solved by bringing down
the shared_buffers a lot.
The restart works because the server is in fact down. I think pg_lsclusters
showed online because of a stale runfile.

I was hoping that the memory allocation improvements in postgres 9.3 would
solve these issues, but this post makes me think that they won't:
http://www.postgresql.org/message-id/CAHyXU0xa5EgvjeH=4vp-eZDJdS5kMQuiDivvTRLjY-uZ62Y44w@mail.gmail.com

Does anyone know solutions?

Cheers,

WBL

On Wed, Feb 26, 2014 at 10:53 AM, Willy-Bas Loos <willybas(at)gmail(dot)com> wrote:

> Hi,
>
> I had a problem today and i fixed it by restarting postgres.
> That doesn't seem to make sense to me, what could have been going on?
>
> This is the log:
> 2014-02-26 04:30:45 CET db: ip: us: FATAL: could not send data to WAL
> stream: SSL error: sslv3 alert unexpected message
>
> cp: cannot stat
> `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such
> file or directory
> 2014-02-26 04:30:45 CET db: ip: us: LOG: unexpected pageaddr 64/3FBC6000
> in log file 100, segment 98, offset 12345344
> cp: cannot stat
> `/data/postgresql/9.1/main/wal_archive/000000010000006400000062': No such
> file or directory
> 2014-02-26 04:30:45 CET db: ip: us: LOG: streaming replication
> successfully connected to primary
> 2014-02-26 04:32:09 CET db: ip: us: LOG: startup process (PID 5385) was
> terminated by signal 7: Bus error
> 2014-02-26 04:32:09 CET db: ip: us: LOG: terminating any other active
> server processes
>
> The cluster was "online" according to pg_lsclusters, but it was not
> possible to connect to it:
> psql: could not connect to server: No such file or directory
> Is the server running locally and accepting
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
>
> uptime tells me this:
> postgres(at)myserver:~$ uptime
> 10:47:27 up 89 days, 42 min, 1 user, load average: 0.00, 0.00, 0.00
>
> This is postgresql 9.1 on Ubuntu 12.04 on OpenVZ
>
> The weirdest thing is that restarting the postgres cluster fixed it.
> Does this make any sense to you?
>
> Cheers,
>
> WBL
> --
> "Quality comes from focus and clarity of purpose" -- Mark Shuttleworth
>

--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Leonardo M. Ramé 2014-02-26 13:16:02 Determine Client Encoding
Previous Message Tomas Vondra 2014-02-26 10:59:53 Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated