From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly |
Date: | 2017-04-23 09:59:32 |
Message-ID: | CANP8+j+j_5FjeuSz4OUF1=6FnKiMN5bA9xoZD8LUR-swrNx3kA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 23 April 2017 at 00:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now that we've got consistent failure reports about the 009_twophase.pl
> recovery test, I set out to find out why it's failing. It looks to me
> like the reason is that this (twophase.c:2145):
>
> SubTransSetParent(xid, subxid, overwriteOK);
>
> ought to be this:
>
> SubTransSetParent(subxid, xid, overwriteOK);
>
> because the definition of SubTransSetParent is
>
> void
> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>
> not the other way 'round.
>
> While "git blame" blames this line on the recent commit 728bd991c,
> that just moved the call from somewhere else. AFAICS this has actually
> been wrong since StandbyRecoverPreparedTransactions was written,
> in 361bd1662 of 2010-04-13.
Thanks for finding that.
> It's not clear to me how much potential this has to create user data
> corruption, but it doesn't look good at first glance. Discuss.
Well, strangely for different reason I was looking at that particular
call a couple of days back. I didn't spot that issue, but I was
thinking why we even make that call at that time. My conclusion was
that it could be optimized away mostly, since it isn't needed very
often, but its not really worth optimizing.
SubTransSetParent() only matters for the case where we are half-way
through a commit with a large commit. Since we do atomic updates of
commit and subcommmit when on same page, this problem would only
matter when top level xid and other subxids were separated across
multiple pages and the issue would only affect a read only query
checking visibility during the commit for that prepared transaction.
Furthermore, the nature of the corruption is that we set the xid to
point to the subxid; since we never mark a top-level commit as
subcommitted, subtrans would never be consulted and so the corruption
would not lead to any incorrect answer even during the window of
exposure. So it looks to me like this error shouldn't cause a problem.
We can fix that, but...
> Also, when I fix that, it gets further but still crashes at the same
> Assert in SubTransSetParent. The proximate cause this time seems to be
> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
> it's computing that as "false", but in reality the subtrans link in
> question has already been set.
Not sure about that, investigating.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2017-04-23 10:05:44 | Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly |
Previous Message | Michael Banck | 2017-04-23 09:14:21 | Re: A note about debugging TAP failures |