From: | Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, "Hsu\, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs\(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: ERROR: subtransaction logged without previous top-level txn record |
Date: | 2020-02-03 13:46:05 |
Message-ID: | 871rrb942q.fsf@ars-thinkpad |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> So, doesn't this mean that it started occurring after the fix done in
> commit 96b5033e11 [1]? Because before that fix we wouldn't have
> allowed processing XLOG_XACT_ASSIGNMENT records unless we are in
> SNAPBUILD_FULL_SNAPSHOT state. I am not telling the fix in that
> commit is wrong, but just trying to understand the situation here.
Nope. Consider again example of WAL above triggering the error:
[ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_2> <commit> <confirmed_flush_lsn> ]
Decoder starting reading WAL at <restart_lsn> where he immediately reads
from disk snapshot serialized earlier, which makes it jump to
SNAPBUILD_CONSISTENT right away. It doesn't read xl_xact_assignment_1,
but it reads xl_xact_assignment_2 already in SNAPBUILD_CONSISTENT state,
so catches the error regardless of this commit.
>> Well, almost. This is true as long initial snapshot construction process
>> goes the long way of building the snapshot by itself. If it happens to
>> pick up from disk ready snapshot pickled there by another decoding
>> session, it fast path'es to SNAPBUILD_CONSISTENT, which is technically a
>> bug as described in
>> https://www.postgresql.org/message-id/87ftjifoql.fsf%40ars-thinkpad
>>
>
> Can't we deal with this separately? If so, I think let's not mix the
> discussions for both as the root cause of both seems different.
These issues are related: before removing the check it would be nice to
ensure that there is no bugs it might protect us from (and it turns out
there actually is, though it won't always protect, and though this bug
has very small probability). Moreover, they are about more or less
subject -- avoiding partially decoded xacts -- and once you dived deep
enough to deal with one, it is reasonable to deal with another instead
of doing that twice. But as a practical matter, removing the check is
simple one-liner, and its presence causes people troubles -- so I'd
suggest doing that first and then deal with the rest. I don't think
starting new thread is worthwhile here, but if you think it does, I can
create it.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-02-03 14:15:42 | Re: BUG #16171: Potential malformed JSON in explain output |
Previous Message | Daniel Gustafsson | 2020-02-03 13:03:04 | Re: Unable to trigger createdb |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-02-03 13:49:18 | Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node |
Previous Message | Andres Freund | 2020-02-03 13:23:19 | Re: Cache relation sizes? |