Quick Links

Re: Next steps in debugging database storage problems?

From:	Jacob Bunk Nielsen <jacob(at)bunk(dot)cc>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Next steps in debugging database storage problems?
Date:	2014-12-11 08:31:12
Message-ID:	spamdrop+87lhmed1v3.fsf@atom.bunk.cc
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

A final followup from my side to this post for anyone who may find this
thread in archives in the future.

On the 15th of August Jacob Bunk Nielsen <jacob(at)bunk(dot)cc> wrote:
> On the 1st of July 2014 Jacob Bunk Nielsen <jacob(at)bunk(dot)cc> wrote:
>
>> We have a PostgreSQL 9.3.4 running in an LXC container on Debian
>> Wheezy on a Linux 3.10.43 kernel on a Dell R620 server. Data are
>> stored on a XFS file system. We are seeing problems such as:
>>
>> unexpected data beyond EOF in block 2 of relation base/805208133/1238511128
>>
>> and
>>
>> could not read block 5 in file "base/805208348/1259338118": read only
>> 0 of 8192 bytes
>>
>> This seems to occur every few days after the server has been up for
>> 30-40 days. If we reboot the server it'll be another 30-40 days before
>> we see any problems again. [...]
>
> This time it took 45 days before this happened:
>
> LOG: unexpected EOF on standby connection
> ERROR: unexpected data beyond EOF in block 140 of relation base/805208885/805209852
> HINT: This has been seen to occur with buggy kernels; consider updating your system.
>
> It always happens with small tables with lots of inserts and deletes.
> From previous experience we know that it's now going to happen again in
> a few days, so we'll probably try to schedule a reboot to give us
> another 30-40 days.

We have concluded that it's probably a bug in the autovacuuming. Since
we changed how often we vacuum those busy tables we haven't seen any
problems for the past 2 months:

We changed:

autovacuum_vacuum_threshold = 100000 (default: 50)

and

autovacuum_vacuum_scale_factor = 0 (default 0.2, 0 turns it off)

The default settings caused autovacuum to run every minute, and
eventually we would hit some bug that caused the problems described
above.

My colleague who has done most of the work find this has promised to try
to create a working test case and file a proper bug report.

Best regards

Jacob

In response to

Re: Next steps in debugging database storage problems? at 2014-08-15 07:23:23 from Jacob Bunk Nielsen

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jack Douglas	2014-12-11 09:10:08	Re: new index type with clustering in mind.
Previous Message	Tom Lane	2014-12-11 05:18:30	Re: Defining functions for arrays of any number type