Re: [PATCHES] Restartable Recovery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, Marko Kreen <markokr(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [PATCHES] Restartable Recovery
Date: 2006-07-16 16:40:46
Message-ID: 6643.1153068046@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
>> Ouch. That's a bit nasty. You can't just apply a postponed split at
>> checkpoint time, because the WAL record could easily be somewhere after
>> the checkpoint, leading to duplicate insertions.

> To do this we would need to have another rmgr specific routine that gets
> called at a recovery checkpoint. This would then write to disk the
> current state of the incomplete multi-WAL actions, in some manner.
> During the startup routines we would check for any pre-existing state
> files and use those to initialise the incomplete action cache. Cleanup
> would then discard all state files.

I thought about that too, but it seems very messy, eg you'd have to
actually fsync the state files to be sure they were safely down to disk.
Another problem is that WAL records between the checkpoint's REDO point
and the physical checkpoint location could get replayed twice, leading
to duplicate entries in the rmgr's state. Consider a split start WAL
entry located in that range, with the split completion entry after the
checkpoint --- on restart, we'd load a pending-split entry from the
state file and then create another one on seeing the split-start record
again.

A compromise that might be good enough is to add an rmgr routine defined
as "bool is_idle(void)" that tests whether the rmgr has any open state
to worry about. Then, recovery checkpoints are done only if all rmgrs
say they are idle. That is, we only checkpoint if there is not a need
for any state files. At least for btree's usage, this should be all
right since the "split pending" state is short-lived and so most of the
time we'd not need to skip checkpoints. I'm not totally sure about GIST
or GIN though (Teodor?).

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Buchanan 2006-07-16 16:52:17 Re: Windows buildfarm support, or lack of it
Previous Message Joshua D. Drake 2006-07-16 16:32:11 Re: Windows buildfarm support, or lack of it

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-07-16 18:34:40 Re: Possible explanation for Win32 stats regression test failures
Previous Message Tom Lane 2006-07-16 16:24:27 Possible explanation for Win32 stats regression test failures