Re: 8.3.5 broken after power fail SOLVED

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Naomi Walker <nwalker(at)eldocomp(dot)com>, Michael Monnerie <michael(dot)monnerie(at)is(dot)it-management(dot)at>, pgsql-admin(at)postgresql(dot)org
Subject: Re: 8.3.5 broken after power fail SOLVED
Date: 2009-02-22 02:43:11
Message-ID: dcc563d10902211843k54b38175m615b9d83d30927f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Sat, Feb 21, 2009 at 3:41 PM, Ron Mayer
<rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> Naomi Walker wrote:
>> Other than disaster tests, how would I know if I have an system that
>> lies about fsync?
>
> Well, the linux kernel tries to detect it on bootup and
> will give messages like this:
> %dmesg | grep 'disabling barriers'
> JBD: barrier-based sync failed on md1 - disabling barriers
> JBD: barrier-based sync failed on hda3 - disabling barriers
> when it detects certain types of unreliable fsync's. The command
> %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
> will give you clues if a hard drive itself even can support
> a non-lying fsync when it's internal cache is enabled.
>
>
> Sadly some filesystems (ext3) lie even above and beyond what
> Linux does - by only using the write barriers correctly
> when the inode itself is modified; not when the data is modified.
> A test program here:
> http://archives.postgresql.org/pgsql-performance/2008-08/msg00159.php
> can detect those cases where the kernel & drive don't lie
> about fsync but ext3 lies in spite of them; with more background
> info here:
> http://article.gmane.org/gmane.linux.file-systems/21373
> http://thread.gmane.org/gmane.linux.kernel/646040
>
>
> Elsewhere in the archives you can find programs that measure
> how fast fsyncs happen - but on your hardware, and you can
> try to see if those numbers approximately match how fast your
> disks spin. But then you still need to make sure the test
> program used the same methods for syncing the drive that your
> postgres configuration files are choosing.
>
> I wonder if the only really safe way is to run a very
> write intensive database script and pull and kill your
> system in a number of ways, including yanking power to
> the system; to disk arrays, etc and see if your database died.

Well, you can't prove it's 100% safe but you can usually find most of
the not safe systems this way. I usually setup a big pgbench db, run
500 or so concurrent, wait 5 or 10 minutes, run a checkpoint, and
halfway through it pull the plug. It's still possible for a system to
fail after passing this test, but I feel a lot better knowing I've
done it a couple of times and the db came back up without problems.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Tena Sakai 2009-02-22 08:45:40 trouble restoring data from postgres 8.3.3 to freshly installed 8.3.6
Previous Message Ron Mayer 2009-02-21 22:41:13 Re: 8.3.5 broken after power fail SOLVED