From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Chris Angelico <rosuav(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Plug-pull testing worked, diskchecker.pl failed |
Date: | 2012-10-24 16:18:53 |
Message-ID: | CAOR=d=3XFcVgMu9Eyd0nFehW1x=qzrewxDzPcC0MVy3J5MD6XQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Oct 24, 2012 at 8:04 AM, Chris Angelico <rosuav(at)gmail(dot)com> wrote:
> On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico <rosuav(at)gmail(dot)com> wrote:
>>> After reading the comments last week about SSDs, I did some testing of
>>> the ones we have at work - each of my test-boxes (three with SSDs, one
>>> with HDD) subjected to multiple stand-alone plug-pull tests, using
>>> pgbench to provide load. So far, there've been no instances of
>>> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
>>> of errors.
>>
>> Try starting pgbench, and then halfway through the timeout for a
>> checkpoint timeout issue a checkpoint and WHILE the checkpoint is
>> still running THEN pull the plug.
>>
>> Then after bringing the server up (assuming pg starts up) see if
>> pg_dump generates any errors.
>
> Thanks for the tip. I've been flat-out at work these past few days and
> haven't gotten around to testing in the middle of a checkpoint, but I
> have done something that might also be of interest. It's inspired by a
> combination of diskchecker and pgbench; a harness that puts the
> database under load and retains a record of what's been done.
>
> In brief: Create a table with N (eg 100) rows, then spin as fast as
> possible, incrementing a counter against one random row and also
> incrementing the "Total" counter. When the database goes down, wait
> for it to come up again; when it does, check against the local copy of
> the counters and report any discrepancies.
>
> The code's written in Pike, using the same database connection logic
> that we use in our actual application (well, some of our code is C++
> and some is PHP, so this corresponds to one part of our app), so this
> is roughly representative of real usage.
>
> It's about a page or two of code: http://pastebin.com/UNTj642Y
Very cool. Nice little project.
> Currently, all the key parameters (database connection info (which has
> been censored for the pastebin version), pool size, thread count, etc)
> are just variables visible in the script, simpler than parsing
> command-line arguments.
>
> Is this a useful and plausible testing methodology? It's definitely
> showed up some failures. On a hard-disk, all is well as long as the
> write-back cache is disabled; on the SSDs, I can't make them reliable.
Yes it seems to be quite a good idea actually.
> Is a single table enough to test for corruption with?
If it fails, definitely, if it passes maybe.
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Angelico | 2012-10-24 16:56:39 | Re: Need sql to pull data from terribly architected table |
Previous Message | salah jubeh | 2012-10-24 16:04:17 | Re: Postgresql high available solution |