Re: clog_redo causing very long recovery time

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: clog_redo causing very long recovery time
Date: 2011-05-06 03:41:10
Message-ID: 4DC36DD6.8030405@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/05/2011 08:22 PM, Tom Lane wrote:
> Joseph Conway <mail(at)joeconway(dot)com> writes:
>> The attached fix-clogredo diff is my proposal for a fix for this.
>
> That seems pretty grotty :-(
>
> I think a more elegant fix might be to just swap the order of the
> ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId. The
> reason that would help is that pg_subtrans isn't WAL-logged, so if
> we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we
> won't have written any XLOG entry, and thus repeated failures will not
> result in repeated XLOG entries. I seem to recall having considered
> exactly that point when the clog WAL support was first done, but the
> scenario evidently wasn't considered when subtransactions were stuck
> in :-(.

Yes, that does seem much nicer :-)

> It would probably also help to put in a comment admonishing people
> to not add stuff right there. I see the SSI guys have fallen into
> the same trap.

Right -- I think another similar problem exists in GetNewMultiXactId
where ExtendMultiXactOffset could succeed and write an XLOG entry and
then ExtendMultiXactMember could fail before advancing nextMXact. The
problem in this case is that they both write XLOG entries, so a simple
reversal doesn't help.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-05-06 04:00:53 Re: clog_redo causing very long recovery time
Previous Message Alvaro Herrera 2011-05-06 03:29:13 Re: clog_redo causing very long recovery time