Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated

From: john gale <john(at)smadness(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated
Date: 2014-02-26 21:57:24
Message-ID: 1E61A5BF-2388-4B35-861D-BC7A7323161B@smadness.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Feb 26, 2014, at 2:59 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 26 Únor 2014, 8:45, john gale wrote:
>
>> munin2=# delete from testruns where ctid = '(37069305,4)';
>> ERROR: tuple concurrently updated
>
> AFAIK this error is raised when a before trigger modifies the row that is
> being deleted. Imagine a trigger that does this
>
> UPDATE testruns SET mycolumn = 1 WHERE id = OLD.id;
> RETURN OLD;
>
> Given the way MVCC in postgres works (copying row when updating), the
> error makes sense. In 9.0 this worked by silently skipping the DELETE
> (incidentally, I had a few reports about tables that can't be deleted
> because of this in the past month).
>
> Anyway, do you have any triggers on the table? If yes, try to disable
> them. I suspect the data are corrupted in a way that causes update on the
> deleted row - either directly, or maybe because of a cascading effect.

There were a few triggers auto-created by a foreign key constraint but we removed the constraint, which removed the triggers:

munin2=# select * from pg_trigger;
tgrelid | tgname | tgfoid | tgtype | tgenabled | tgisinternal | tgconstrrelid | tgconstrindid | tgconstraint | tgdeferrable | tginitdeferred | tgnargs | tgattr | tgargs | tgqual
---------+--------+--------+--------+-----------+--------------+---------------+---------------+--------------+--------------+----------------+---------+--------+--------+--------
(0 rows)

The tuple error still exists, however:

munin2=# delete from testruns where ctid = '(37069305,4)';
ERROR: tuple concurrently updated
munin2=# select id from testruns where ctid = '(37069305,4)';
id
-----------
141908486
(1 row)

munin2=# delete from testruns where id = 141908486;
ERROR: tuple concurrently updated
munin2=# select * from testruns where id = 141908486;
ERROR: unexpected chunk number 0 (expected 1) for toast value 155900302 in pg_toast_16822

> I'm wondering if it might be caused by RI triggers - maybe yes, but I'm
> not aware of any RI trigger doing updates.
>
> That being said, I think that what you're doing is wrong. If you think you
> have a corrupted database, I'd strongly suggest doing dump/restore. Or how
> do you know there's no other corruption lurking in the files, slowly
> spreading to other parts of the database?

We're aware that we're shoveling dirt to patch a crack in a large highway. However at the moment we value uptime rather than strict integrity of the data (141mil rows allows some wiggle room), and since we don't modify these rows after they're inserted, I can't imagine how this kind of row corruption can "slowly spread to other parts of the database".

~ john

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2014-02-26 22:08:57 Re: Cancelling of autovacuums considered harmful
Previous Message Scott Marlowe 2014-02-26 21:34:33 Re: Cancelling of autovacuums considered harmful