Re: Lost rows/data corruption?

From: "Andrew Hall" <temp02(at)bluereef(dot)com(dot)au>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Lost rows/data corruption?
Date: 2005-02-15 04:05:50
Message-ID: 008001c51313$a2d2d680$5001010a@bluereef.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom,

Here is the data you requested. It took little while to gather it as this
kind of corruption doesn't happen all the time.

The first sign that we know something is wrong is our application freezing
when communicating with the DB.

If we then issue a vacuum, here's the output:

sonar=# VACUUM FULL;
WARNING: index "user_session_pkey" contains 85613 row versions, but table
contains 85513 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "user_session_pkey" contains 85613 row versions, but table
contains 85513 row versions
HINT: Rebuild the index with REINDEX.
VACUUM

If we the try a reindex, here's the output:

sonar=# REINDEX TABLE user_session;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.

There are duplicates and massive amounts of erroneous data in most columns
in the table. From here, no amount of row deletion seems to clear the issue,
the only remedy is a truncate on the table and repopulate. In fact if we try
even a select we get this:

WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Here is the schema for the table:

sonar=# \d user_session;
Table "public.user_session"
Column | Type |
Modifiers

----------------+-----------------------------+---------------------------------
-------------------
user_id | integer | not null
session_id | integer | not null
group_id | integer |
src | character varying(22) | not null
hit_bytes_in | bigint |
miss_bytes_in | bigint |
logon_time | timestamp without time zone | default
('now'::text)::timestamp
(6) with time zone
logoff_time | timestamp without time zone | default
('now'::text)::timestamp
(6) with time zone
account_active | character(1) | default 'Y'::bpchar
hit_bytes_out | bigint |
miss_bytes_out | bigint |
cost_bytes_in | double precision |
cost_bytes_out | double precision |
time_cost | double precision |
Indexes:
"user_session_pkey" primary key, btree (user_id, session_id)
Foreign-key constraints:
"$1" FOREIGN KEY (user_id) REFERENCES user_table(user_id) ON DELETE
CASCADE
Triggers:
delsessionusagetrigger AFTER DELETE ON user_session FOR EACH ROW EXECUTE
PRO
CEDURE delsessionusagefunc()

I'm most concerned that a primary constraint is being actively violated, and
I don't understand how this is possible.

Any help appreciated.
----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Hall" <temp02(at)bluereef(dot)com(dot)au>
Cc: <pgsql-general(at)postgresql(dot)org>
Sent: Friday, February 04, 2005 10:12 AM
Subject: Re: [GENERAL] Lost rows/data corruption?

> "Andrew Hall" <temp02(at)bluereef(dot)com(dot)au> writes:
>> We have a long running DB application using PG7.4.6. We do a VACUUM FULL
>> every night and a normal 'maintenance' VACUUM every hour. We do nothing
>> with
>> any indexes. Every now and then we get errors from the database whereby
>> an
>> update will fail on a table saying that there is duplicate violation on a
>> primary keyed row. Theoretically this is impossible as the constraint
>> should
>> prevent any duplication, ever. When the next vacuum is run, we see an
>> error
>> saying that there are more rows than the database can account for (or
>> words
>> to that effect) and we should either REINDEX or TRUNCATE the table.
>
> Could we see the exact log trace, rather than handwaving? Also the
> schemas of the table(s)/index(es) involved might be useful.
>
> regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-02-15 04:25:41 Re: Lost rows/data corruption?
Previous Message Bruce Momjian 2005-02-15 03:30:35 Re: Bug in COPY from CSV?