Serious Crash last Friday

From: "Henrik Steffen" <steffen(at)city-map(dot)de>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Serious Crash last Friday
Date: 2002-06-17 06:43:37
Message-ID: 025001c215ca$4fe9e1a0$7100a8c0@topconcepts.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hello all,

on Friday we experienced a very very worrying crash of our postgresql
server.

Our system runs on a Intel Pentium IV, 1.6 GHz, 1 GB RAM with latest
Postgresql-Server
7.2.1 (redhat rpm) - on rather heavy load

There are 109 user tables in one database, the largest tables contain 60
columns and
approx. 400.000 rows. it's a dedicated database-only machine.

Well, the crash was indicated as follows: One of my employees complained
that she couldn't
work anymore (via webinterface). The error-message was due to an error in
the
employee-table. This particular table has a unique row for employee-numbers.
Suddenly
there were 11 entries for the same employee. Even my name was included
twice, and
another employee still working on friday afternoon was also included 3
times. Note:
This was a table with a UNIQUE KEY - this shouldn't be possible IMHO.

Taking a closer look, I found additional tables, with non-unique values in
UNIQUE columns.

When trying to delete unique values by using the OIDs, I found out, that
even the OIDs
were the same!!!! Taking a yet closer look, I found out by querying
pg_tables that
there were duplicates of some tables. Then there was the message: "Backend
message type
0x44 arrived while idle"

I was running VACUUM and VACUUM FULL a hundred times - but it failed to
repair these
errors. It didn't even succeed in running VACUUM on all tables: VACUUM
complained something
about "UNIQUE" (I didn't write down the exact error message though).

Then I tried to DUMP as much as I could, then I stopped the database, moved
the db-folder to
a different location, did a new initdb and restored the whole system.
Unfortunately
there was one table I couldn't dump at all and I had to use the 15 hours old
backup copy.

But, please correct me if I am wrong, this should never actually happen,
shouldn't it?

Anyone had any of these problems before? I will see if this happens again -
and if it
does I will have to think about using a different backend-server. I'll don't
have to
explain to you, that a database server that corrupts data, is completely
useless.

Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com Tel. +49 4141 991230
mail: steffen(at)topconcepts(dot)com Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2002-06-17 07:43:11 Re: Serious Crash last Friday
Previous Message tony 2002-06-17 06:19:36 Re: advocacy