Re: Database corruption event, unlockable rows, possibly bogus virtual xids? (-1/4444444444)

From: Ned Wolpert <ned(dot)wolpert(at)imemories(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Database corruption event, unlockable rows, possibly bogus virtual xids? (-1/4444444444)
Date: 2013-02-22 16:59:27
Message-ID: CAFehBFno+H1FLWAkPqmS4tHOFCTw6vc0RS4K+XbV=SBvjgWytg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

As it turns out, it was prepared transactions at fault. I restored my test
bed, forced a rollback and the locks were gone. At least now I can connect
those dots. :-)

On Fri, Feb 22, 2013 at 1:15 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>wrote:

> Ned Wolpert wrote:
> > I'm doing a postmortem on a corruption event we had. I have an idea on
> what happened, but not sure.
> > I figure I'd share what happened and see if I'm close to right here.
> >
> > Event: Running 9.1.6 with hot-standby, archiving 4 months of wal
> files, and even a nightly pg_dump
> > all. 50G database. Trying to update or delete a row in a small (21 row,
> but heavily used table) would
> > lock up completely. Never finish. Removed all clients, restarted the db
> instance, no joy. Check
> > pg_stat_activity, and nothing that wasn't idle.... run the delete,
> locked up.
> >
> > Ran (SELECT*FROM pg_class JOIN pg_locks ON pg_locks.relation =
> pg_class.oid;) with no clients
> > touching this row, fresh restart of the db, and saw virtualtransactions
> against this same table where
> > the values would be -1/nnnnn were nnnnn was a huge number. Turns out we
> had about 159 entries from
> > different tables in the database. Checked hot-standby and, of course,
> no locks or anything. Switched
> > to hot-standby.
> >
> > Hot-standby instantly gained these locks, Also noticed that 2 prepared
> transactions migrated to the
> > hot-standby. Binary upgraded to 9.1.8, locks still existed. Ended up
> executing the one thing we knew
> > would work. Take the site down, pg_dumpall to fresh instance. Everything
> is fine.
> >
> > A little more background: We were running 9.1.4 back when 9.1.6 came
> out. We saw there was possible
> > corruption issues and did a binary upgrade and reindexing. Everything
> seemed to be fine, but I wonder
> > if we really had problems back then. We rebuilt the hot-standby after
> the binary upgrade via normal
> > restore and wal-file replays. I should also note that this row that had
> the lock on it that would not
> > go away, was created by an app server that was killed (via -9) since it
> was non-responsive, and the
> > row 'create date' (in db and logs) is the exact time the app server was
> killed.
> >
> > I was wondering if a) these virtualtransactions that start with '-1/'
> indicate a problem, b) if this
> > could have happened from pre 9.1.6 corruption that was fixed in 9.1.6.
> Or, could this have occurred
> > when we killed that app server? Or.... am I looking in the wrong place.
> >
> > I do still have the old data directories so I can start them up and
> check out the dataset. Any advice?
>
> What do you get for
> SELECT * FROM pg_prepared_xacts;
>
> Have you tried to ROLLBACK PREPARED those prepared transactions?
>
> The "-1" is quite alright:
>
> BEGIN;
> INSERT INTO addr VALUES (10, '127.0.0.1');
> PREPARE TRANSACTION 'test';
>
> test=> SELECT * FROM pg_prepared_xacts;
> transaction | gid | prepared | owner | database
> -------------+------+-------------------------------+---------+----------
> 2275 | test | 2013-02-22 09:09:04.627137+01 | laurenz | test
>
> test=> SELECT database, relation, transactionid, virtualtransaction, mode
> FROM pg_locks;
> database | relation | transactionid | virtualtransaction | mode
>
> ----------+----------+---------------+--------------------+------------------
> 16421 | 11069 | | 2/15 | AccessShareLock
> | | | 2/15 | ExclusiveLock
> | | 2275 | -1/2275 | ExclusiveLock
> 16421 | 24711 | | -1/2275 |
> RowExclusiveLock
> (4 rows)
>
> The last two rows are the prepared transaction.
>
> ROLLBACK PREPARED 'test';
>
> Gets rid of them.
>
> Yours,
> Laurenz Albe
>

--
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..." --Marlowe

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message domehead100 2013-02-22 22:11:03 base backup/restore + streaming replication => weirdness
Previous Message Albe Laurenz 2013-02-22 08:15:50 Re: Database corruption event, unlockable rows, possibly bogus virtual xids? (-1/4444444444)