PostgreSQL corruption

From: James Sewell <james(dot)sewell(at)jirotech(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: PostgreSQL corruption
Date: 2017-02-14 04:21:32
Message-ID: CAANVwEv4Mkv5keGZUdFT4FrCDQTRmx8vxA+PHD4G+Onuao+S=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello All,

I am working with a client who is facing issues with database corruption
after a physical hard power off (the machines are at remote sites, this
could be a power outage or user error).

They have an environment made up of many of the following consumer grade
stand alone machines:

- Windows 7 SP1
- PostgreSQL 9.2.4
- Integrated Raid Controller
- Managed by Intel Rapid Storage Technology
- RAID 1 over two disks
- Disk caching disabled
- Not battery backed
- Disk cache disabled
- 2x Seagate SATA disk drives (st500lm021-1kj152
<http://www.seagate.com/www-content/product-content/momentus-fam/momentus-thin/en-us/docs/100737930b.pdf>
)

PostgreSQL is configured as follows:

- fsync=on
- full_page_writes=on
- wal_sync_method=fsync_writethrough

Windows is configured as follows:

- Disk caching disabled for the RAID1 set

They have currently proven that the corruption is repeatable in a testbed
with and without OS/RAID controller caching - but I am working with them to
make this process a little more detailed.

The new process will be:

1. Power on machine
2. If PostgreSQL doesn't start archive $PGDATA and initdb
3. Perform a pg_dumpall to test for corruption
4. If pg_dumpall fails then archive $PGDATA and initdb
5. Start test suite (which mimics high load from their application),
which INSERTS and DELETES records in and out of transaction
6. After 15 minutes cut power and repeat process

We are hoping to get about 20 machines in this testbed, giving us around
1500 power cycles per day.

Test scenarios which have been floated so far:

- As described above, all caching off
- As described above, all caching off, 9.2 stable
- As described above, all caching off, 9.5 stable with checksums

Can anyone think of anything else we should be considering / testing /
factoring in?

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000> *W* www.jirotech.com *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

--

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2017-02-14 04:41:25 Re: PostgreSQL corruption
Previous Message David G. Johnston 2017-02-14 03:27:35 Re: xmlelement AND timestamps.