Database corruption in 7.0.3

From: Tim Allen <tim(at)proximity(dot)com(dot)au>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Database corruption in 7.0.3
Date: 2001-03-15 08:52:13
Message-ID: Pine.LNX.4.21.0103151916550.16580-100000@bee.proximity.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have an application that we were running quite happily using pg6.5.3
in various customer sites. Now we are about to roll out a new version of
our application, and we are going to use pg7.0.3. However, in testing
we've come across a couple of isolated incidents of database
corruption. They are sufficiently rare that I can't reproduce the problem,
nor can I put my finger on just what application behaviour causes the
problems.

The symptoms most often involve some sort of index corruption, which is
reported by vacuum and it seems that vacuum can fix it. On occasion vacuum
reports "invalid OID" or similar (sorry, don't have exact wording of
message). On one occasion the database has been corrupted to the point of
unusability (ie vacuum admitted that it couldn't fix the problem), and a
dump/restore was required (thankfully that at least worked). The index
corruption also occasionally manifests itself in the form of spurious
uniqueness constraint violation errors.

The previous version of our app using 6.5.3 has never shown the slightest
symptom of database misbehaviour, to the best of my knowledge, despite
fairly extensive use. So our expectations are fairly high :-).

One thing that is different about the new version of our app is that we
now use multiple connections to the database (previously we only had
one). We can in practice have transactions in progress on several
connections at once, and it is possible for some transactions to be rolled
back under application control (ie explicit ROLLBACK; statement).

I realise I haven't really provided an awful lot of information that would
help identify the problem, so I shall attempt to be understanding if
no-one can offer any useful suggestions. But I hope someone can :-). Has
anyone seen this sort of problem before? Are there any known
database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in
the mailing lists. Is using multiple connections likely to stimulate any
known areas of risk?

BTW we are using plain vanilla SQL, no triggers, no new types defined, no
functions, no referential integrity checks, nothing more ambitious than a
multi-column primary key.

The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our
testing boxes and on my development box we have never seen this, but we
have seen it several times on our other test box and at least one customer
site, so there is some possibility it's related to dodgy hardware. The
customer box with the problem is a multi-processor box, all the other
boxes we've tested on are single-processor.

TIA for any help,

Tim

--
-----------------------------------------------
Tim Allen tim(at)proximity(dot)com(dot)au
Proximity Pty Ltd http://www.proximity.com.au/
http://www4.tpg.com.au/users/rita_tim/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul 2001-03-15 09:40:47 Sheduling in SQL
Previous Message Stephan Szabo 2001-03-15 05:09:52 Re: Union on view and..