From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Paul Smith <paul(at)pscs(dot)co(dot)uk> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound |
Date: | 2015-05-26 18:47:36 |
Message-ID: | 20150526184736.GX5885@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Paul Smith wrote:
> No, nothing like that. It was just running fine, and then suddenly (at 2am
> on 23 May) it started throwing up loads of these errors. The DB server
> wasn't even restarted at that point. It was just working fine, then suddenly
> wasn't. (The first error was at 02:00:32 BST, then every few minutes after
> that there's another one).
Another crazy thought is that the host system messed up and overwrote
part of the table with pages from some other guest system. I have never
seen a report of such a thing happening.
> It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on
> 23 May and that looks to be absolutely fine. What I have done now is restore
> that backup and import the new data which arrived since that backup was
> made, and it seems OK now. I still have the 'broken' installation in case
> more information is needed from it. I'd try to get a raw dump of the damaged
> tuple data if I knew how to find where it is in the relation file...
Well, you could try a binary search to locate the (or a) borked page
within the table. Or you could write a plpgsql function with an
EXCEPTION block that reads each tuple, and in case of error reports the
CTID of the tuple being read. (select * from tab where
ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ',
generated_ctid). Once you have located the problem tuple, a pageinspect
dump would be the starting point.
Another idea is to attach a debugger to the backend process, and set a
breakpoint on function errfinish. Once the error is hit the breakpoint
will stop the execution before the error is reported; at that point you
can request for a backtrace from the debugger. (Make sure to examine
errordata to see whether the error being hit is the multixact one.)
> I suppose it's possible that it was disk or memory corruption, but I've seen
> that before, and it hasn't looked like this.
Yeah, same here.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-05-26 18:55:15 | Re: Run pgindent now? |
Previous Message | Josh Berkus | 2015-05-26 18:20:58 | Re: fsync bug faq for publication? |