Tambet Matiisen <tambet(dot)matiisen(at)gmail(dot)com> wrote:
> On 16.03.2011 17:09, Kevin Grittner wrote:
>> Tambet Matiisen<tambet(dot)matiisen(at)gmail(dot)com> wrote:
>>
>>> Pre-live database is restored from live database dump every
>>> night.
>>
>> How is that done? A single pg_dump of the entire live database
>> restored using psql? Are both database servers at the same
>> PostgreSQL version?
>
> Yes, I use pg_dump on live server and the result is
> rdiff-backupped into development server. Whole SQL dump is 12G
> without compression and the rdiff delta is about 10-20MB every
> day. Then I drop pre-live database on development server and
> recreate it using createdb and psql.
createdb, not initdb? I suggest you backup and delete everything in
the data directory, and start with initdb, and see whether the
problem still exists. If it goes away, the problem was in your
shared system tables. If it persists, the problem is in your backup
files, and I would try a delete and a fresh copy. If *that* fixes
it you know the problem was with rdiff-backup. (Of course, keeping
copies of things before the delete might provide useful forensic
information.)
> For a while development server was running 8.4 and live server
> 8.1. Now both are 8.4, but this shouldn't matter, as I do backup
> and restore via SQL.
I hope you were using the 8.4 version of pg_dump when you were in
the dual-version situation. Using the earlier version of pg_dump is
not guaranteed to provide a backup which can be cleanly installed on
a later version. That could *possibly* be related to current
problems.
>> You're running pg_dump against a database you just restored from
>> a pg_dump image?
>
> Hmm, yeah. This sounds rather dumb, but haven't got to that yet.
Well, it might not be as dumb as you think, if it uncovered a
problem with your dump/restore process from live to pre-live.
> Development server contains some additional databases as well,
> that do not exist on live server.
So are you really using pg_dumpall or pg_dump?
>>> Both fsync and full_page_writes are on.
>>
>> Good. Without those an OS or hardware crash can corrupt your
>> database.
>
> Actually they are commented out, but I suppose this means "on".
Yeah, they default to the safe setting.
> It's not ECC memory.
Well, then there has been proven to be a non-negligible possibility
of occasional random bit-flips. Seriously, next time you upgrade,
make sure any database server has ECC RAM.
> It is possible, that restore of pre-live database using psql lasts
> so long, that backup of the same database using pg_dump is already
> kicking in.
Hmmm... You might want to do enough logging of the processes to be
able to confirm or eliminate that possibility. Dumping an
incompletely-restored database might generate some odd errors.
-Kevin