Re: ERROR: could not open file "base/125542/12631" Corruption?

From: Mike Broers <mbroers(at)gmail(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: ERROR: could not open file "base/125542/12631" Corruption?
Date: 2013-10-04 19:14:39
Message-ID: CAB9893jm3fFmNrZpROuA_04WGY-i082mP4ofBewkPKXP-_HZQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Update - someone unleashed a 'cleanup script' yesterday via puppet to
multiple hosts and greedily deleted files that had not been modified in 15
days. This is the most likely culprit so mystery basically solved.
Thankfully this is in QA, whew! It would be interesting to still know if
there are ways of having postgres check and verify that files it expects to
find are there, and to get an idea on the extent of the damage.

On Fri, Oct 4, 2013 at 12:10 PM, Mike Broers <mbroers(at)gmail(dot)com> wrote:

> Strange, this is happening in a totally different environment now too.
> The only thing these two environments share is a SAN, but I wouldnt think
> something going on at the SAN level would make files disappear. Any
> suggestions are greatly appreciated.
>
>
> On Fri, Oct 4, 2013 at 9:40 AM, Mike Broers <mbroers(at)gmail(dot)com> wrote:
>
>> Hello, our postgresql 9.2.4 qa database (thankfully its just qa) seems to
>> be hosed.
>>
>> Starting at around 3:39am last night I started seeing errors about
>> missing files and now I cannot run a pgdump or a vacuum without it
>> complaining about files that it cannot find with errors like this: ERROR:
>> could not open file "base/125542/12631". When I check the filesystem the
>> files are indeed not there. The 1am regular vacuum completed and its log
>> is clean. The postgres log is clean before these errors occurred.
>>
>> Since this is qa we do not perform backups, and the solution if we cannot
>> repair the problem will be to create a fresh qa server but I am intrigued
>> about how to determine the source of the problem and the extent of the
>> problem.
>>
>> Is there a way to force vacuum to continue on errors or an alternate way
>> to help determine all the missing files?
>>
>> It might be totally unrelated, but yesterday morning on this qa server I
>> stopped postgres, and created a symlink to pg_xlog so that it was writing
>> to a different volume, and restarted. This was working fine all day so its
>> possibly a red herring but I thought I should mention it.
>>
>> Any advice is appreciated, thanks!
>>
>>
>>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Amador Alvarez 2013-10-04 22:48:37 How to track down locks held by recovery process on a slave?
Previous Message Alvaro Herrera 2013-10-04 17:51:44 Re: ERROR: could not open file "base/125542/12631" Corruption?