Re: Possible causes for database corruption and solutions

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Michael Clark <codingninja(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Possible causes for database corruption and solutions
Date: 2009-12-16 02:34:50
Message-ID: 4B28474A.3000501@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 16/12/2009 6:39 AM, Michael Clark wrote:
> Hello all,
>
> Over the past 6 months or so I have posted to the list a couple times
> looking for information regarding recovering databases from corruption.
> At the time the incidents of corruption among our users was very low,
> but the frequency is starting to increase, most likely due to the
> increased user base that has upgraded to the version of our software
> that uses Postgres.
>
> I have a couple questions I am hoping to get some feedback on.

> Secondly, I ask about an alternative solution to the corruption problem
> because with preliminary testing we have seen a significant degradation
> in performance.

From changing to fsync_writethrough ? That's a good thing - it suggests
that maybe now the data is actually hitting disk when Pg asks it to.

You can have fast(er), or safe, but not both. Now that your database is
actually doing what it should be and truthfully promising that data has
hit disk when you commit, you may have to adopt some strategies to
reduce the number of very short repetitive transactions you perform.

( Should Pg perhaps detect OS X and switch the default to
fsync_writethrough ? Or are the "test to see if fsync() works on
startup" plans going anywhere? )

> I then restore the dump into a newly created DB (createdb -E UTF8 Test1)
> with the command:
> pg_restore -F c -d Test1 pgbackup

Make sure to do the restore in a single transaction. It will be
*enormously*, *massively* faster. See the "--single-transaction" option
to pg_restore.

If you can't do that, then set a commit_delay so that PostgreSQL can
batch the fsync()s for commmits together. See postgresql.conf and the
documentation for commit_delay and commit_siblings.

> I am hoping there may be an alternative to fsync_writethrough and/or we
> are barking up the wrong tree w.r.t. the cause of the corruptions.

If fsync_writethrough slowed things down that much then you've almost
certainly nailed the cause of the corruptions. Now you just need to tune
your DB, and adapt how you use the DB, so that you're less affected by
the necessary performance hit imposed by safe and reliable use of disk
storage.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2009-12-16 02:45:06 Re: Possible causes for database corruption and solutions
Previous Message Phoenix Kiula 2009-12-16 01:13:29 Re: How to remove non-UTF values from a table?