From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [PATCHES] [WIP] shared locks |
Date: | 2005-04-27 23:05:40 |
Message-ID: | 3122.1114643140@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Found another interesting thing while testing this. I got a core dump
from the Assert in GetMultiXactIdMembers, complaining that it was being
asked about a MultiXactId >= nextMXact. Sure enough, there was a
multixact on disk, left over from a previous core-dumped test, that was
larger than the nextMXact the current postmaster had started with.
My interpretation of this is that the MultiXact code is violating the
fundamental WAL rule, namely it is allowing data (multixact IDs in data
pages) to reach disk before the relevant WAL record (here the NEXTMULTI
record that should have advanced nextMXact) got to disk. It is very
easy for this to happen in the current system if the buffer page LSNs
aren't updated properly, because the bgwriter will be industriously
dumping dirty pages in the background.
AFAICS there isn't any very convenient way of propagating the true
location of the NEXTMULTI record into the page LSNs of the buffers that
heap_lock_tuple might stick relevant multi IDs into. What's probably
the easiest solution is for XLogPutNextMultiXactId to XLogFlush the
NEXTMULTI record before it returns. This is a mite annoying for
concurrency (because we'll have to hold MultiXactGenLock while flushing
xlog) but it should occur rarely enough to not be a huge deal.
At this point you're probably wondering why OID generation hasn't got
exactly the same problem, seeing that you borrowed all this logic from
the OID generator. The answer is that it would have the same problem,
except that an OID can only get onto disk as part of a tuple insert or
update, and all such events generate xlog records that must follow any
relevant NEXTOID record. Those records *will* get into the page LSNs,
and so the WAL rule is enforced.
So the problem would go away if heap_lock_tuple were generating any xlog
record of its own, which it might be doing by the time the 2PC dust
settles.
Plan B would be to decide that a multi ID that's >= nextMXact isn't
worthy of an Assert failure, but ought to be treated as just a dead
multixact. I'm kind of inclined to do that anyway, because I am not
convinced that this code guarantees no wraparound of multi IDs.
Thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Juan Jose Costello Levien | 2005-04-27 23:35:10 | Developer Community |
Previous Message | Rod Taylor | 2005-04-27 22:44:15 | PITR bad restore possibility? |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2005-04-28 03:46:15 | Re: [HACKERS] Continue transactions after errors in psql |
Previous Message | Bruce Momjian | 2005-04-27 18:53:41 | Re: Cleaning up unreferenced table files |