Data corruption

From: Russell Keane <Russell(dot)Keane(at)inps(dot)co(dot)uk>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Data corruption
Date: 2014-11-13 14:40:31
Message-ID: 8D0E5D045E36124A8F1DDDB463D548557D474B6387@mxsvr1.is.inps.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We appear to have had some corruption on a customer's postgres cluster.

They are on 9.0.17 32bit
Windows Server 2003 - Service pack 2
Intel Xeon 2.66GHZ
4GB Memory
Raid is setup but doesn't look good - just now showing status of Degraded!!
The RAID doesn't look too well.... currently has status Degraded and on the Segments tab and it's showing Segment 1 (Missing)
I guess we can assume the issue is down to hardware...

An engineer has been dispatched to replace the hardware and we are arranging to have the cluster shutdown and backed up to a separate storage device.

Their postgresql.conf file is pretty much as it comes with only the following line added to the end:
custom_variable_classes = 'user_vars'

Everything was fine until 13:28 on 7th November when there was a number of these entries in the log:
2014-11-07 13:28:45 GMT WARNING: worker took too long to start; cancelled

After that the log file was cycled and it started with:
2014-11-07 14:15:19 GMT FATAL: the database system is starting up
2014-11-07 14:15:20 GMT FATAL: the database system is starting up
2014-11-07 14:15:20 GMT LOG: database system was interrupted; last known up at 2014-11-07 13:28:42 GMT
2014-11-07 14:15:21 GMT FATAL: the database system is starting up
2014-11-07 14:15:22 GMT FATAL: the database system is starting up
2014-11-07 14:15:23 GMT FATAL: the database system is starting up
2014-11-07 14:15:23 GMT LOG: database system was not properly shut down; automatic recovery in progress
2014-11-07 14:15:23 GMT LOG: record with zero length at 5/7B4CAC0
2014-11-07 14:15:23 GMT LOG: redo is not required
2014-11-07 14:15:24 GMT FATAL: the database system is starting up
2014-11-07 14:15:25 GMT FATAL: the database system is starting up
2014-11-07 14:15:25 GMT LOG: database system is ready to accept connections
2014-11-07 14:15:25 GMT LOG: autovacuum launcher started
2014-11-07 14:15:33 GMT LOG: unexpected EOF on client connection

Since then whenever trying to write to or query one particular table we receive the following:
2014-11-07 15:13:57 GMT ERROR: invalid page header in block 29838 of relation base/16392/640564

It's always the same error (block and relation) as far as I can tell.

So the question is, what next?
We may have lost data as it couldn't be written but it's not the end of the world.
The more important bit would be to stop any further data loss.

Regards,

Russell Keane
INPS

Tel: +44 (0)20 7501 7277

Follow us<https://twitter.com/INPSnews> on twitter | visit www.inps.co.uk<http://www.inps.co.uk/>

________________________________
Registered name: In Practice Systems Ltd.
Registered address: The Bread Factory, 1a Broughton Street, London, SW8 3QJ
Registered Number: 1788577
Registered in England
Visit our Internet Web site at www.inps.co.uk
The information in this internet email is confidential and is intended solely for the addressee. Access, copying or re-use of information in it by anyone else is not authorised. Any views or opinions presented are solely those of the author and do not necessarily represent those of INPS or any of its affiliates. If you are not the intended recipient please contact is(dot)helpdesk(at)inps(dot)co(dot)uk

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-11-13 14:57:02 Re: After insert trigger not work
Previous Message Brilliantov Kirill Vladimirovich 2014-11-13 14:34:53 Re: After insert trigger not work