Re: Next steps in debugging database storage problems?

From: Jacob Bunk Nielsen <jacob(at)bunk(dot)cc>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Next steps in debugging database storage problems?
Date: 2014-07-03 08:26:02
Message-ID: spamdrop+87egy2x2zp.fsf@atom.bunk.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

Jacob Bunk Nielsen <jacob(at)bunk(dot)cc> writes:

> We have a PostgreSQL 9.3.4 running in an LXC container on Debian
> Wheezy on a Linux 3.10.43 kernel on a Dell R620 server. Data are
> stored on a XFS file system. We are seeing problems such as:
>
> unexpected data beyond EOF in block 2 of relation base/805208133/1238511128
>
> and
>
> could not read block 5 in file "base/805208348/1259338118": read only
> 0 of 8192 bytes

We use streaming replication to a different server on different
hardware. That server had been up for 300+ days and just had an incident
of:

LOG: consistent recovery state reached at 226/E7DE1680
WARNING: page 0 of relation base/805208133/1274861078 does not exist
CONTEXT: xlog redo insert: rel 1663/805208133/1274861078; tid 0/1
PANIC: WAL contains references to invalid pages
LOG: database system is ready to accept read only connections
CONTEXT: xlog redo insert: rel 1663/805208133/1274861078; tid 0/1
LOG: startup process (PID 2308) was terminated by signal 6: Aborted
LOG: terminating any other active server processes

We've rebooted that server now and restarted the replication. We'll see
how it goes in a few hours.

I'm still very interested in hearing any hints you guys may have to how
I should debug these problems.

> I've tried writing a program to simulate a workload that resembles the
> workload on the problematic tables, but I can't get that to fail. So
> what should be my next step in debugging this?

That program has been running for 24+ hours now, and everything just
works as expected, so still no luck in reproducing this problem.

Best regards

Jacob

P.S. Sorry about the double post with different subject - my initial
post was held up for several hours due to putting "Help" in the subject,
so I thought I had been discarded by a list admin.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sandeep Thakkar 2014-07-03 08:35:51 Re: Windows releases - Bundle OpenSSL includes and .libs in the installer?
Previous Message David G Johnston 2014-07-03 08:14:38 Re: Not able to understand how to write group by