Re: ERROR: subtransaction logged without previous top-level txn record

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu\, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs\(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-02-05 09:04:06
Message-ID: 87zhdx76d5.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers


Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:

> On Mon, Feb 3, 2020 at 7:16 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
>> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>>
>> > So, doesn't this mean that it started occurring after the fix done in
>> > commit 96b5033e11 [1]? Because before that fix we wouldn't have
>> > allowed processing XLOG_XACT_ASSIGNMENT records unless we are in
>> > SNAPBUILD_FULL_SNAPSHOT state. I am not telling the fix in that
>> > commit is wrong, but just trying to understand the situation here.
>>
>> Nope. Consider again example of WAL above triggering the error:
>>
>> [ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_2> <commit> <confirmed_flush_lsn> ]
>>
>> Decoder starting reading WAL at <restart_lsn> where he immediately reads
>> from disk snapshot serialized earlier, which makes it jump to
>> SNAPBUILD_CONSISTENT right away.
>>
>
> Sure, I understand that if we get serialized snapshot from disk, this
> problem can occur and probably we can fix by ignoring serialized
> snapshots during slot creation or something on those lines.

There is some confusion. I'll try to reword what we have here:

1) Decoding from existing slot (*not* initial snapshot construction)
starts up, immediately picks up snapshot at restart_lsn (getting into
SNAPBUILD_CONSISTENT) and in some xl_xact_assignment learns that it
hadn't seen in full (no toplevel records) transaction which it is not
even going to stream -- but still dies with "subtransation logged
without...". That's my example above, and that's what people are
complaining about. Here, usage of serialized snapshot and jump to
SNAPBUILD_CONSISTENT is not just legit, it is essential: or order to be
able to stream data since confirmed_flush_lsn, we must pick it up as we
might not be able to assemble it from scratch in time. I mean, what is
wrong here is not serialized snapshot usage but the check.

(Lengthy comment to AllocateSnapshotBuilder in my
0001-Fix-serialized-snapshot-usage-for-new-logical-slots.patch explains
why snapbuilder is not able to do FULL -> CONSISTENT transition on its
own early enough for decoding from existing slot, so the jump on
snapshot pickup is performed to CONSISTENT directly.)

This happens with or without bac2fae05c.

2) *New* slot creationg process picks up serialized snapshot and jumps
to CONSISTENT without waiting for all running xacts to finish. This is
wrong and is a bug (of very low probability), as we risk promising to
decode xacts which we might not seen in full. Sometimes it could be
arrested by "subtransation logged without..." check, but not necessarily
-- e.g. there could be no subxacts at all.

> However,
> what I am trying to understand is whether this can occur from another
> path as well. I think it might occur without using serialized
> snapshots as well because we allow decoding xl_xact_assignment record
> even when the snapshot state is not full. Say in your above example,
> even if the snapshot state is not SNAPBUILD_CONSISTENT as we haven't
> used the serialized snapshot, then also, it can lead to the above
> problem due to decoding of xl_xact_assignment. I have not tried to
> generate a test case for this, so I could easily be wrong here.

What you are suggesting here is 3), which is, well, sort of form of 1),
meaning "subxact logged..." error also pointlessly triggered, but for
new slot creation. With bac2fae0, decoder might miss first
xl_xact_assignment because it is beyond start of reading WAL but
encounter second xl_xact_assignment and die on it due to this check
before even getting FULL.

But now that I'm thinking about it, I suspect that similar could happen
even before bac2fae0. Imagine

<start_of_reading_wal> <xl_xact_assignment_1> <SNAPBUILD_FULL> <subxact_change> <xl_xact_assignment_2> <commit> ... <SNAPBUILD_CONSISTENT>

Before bac2fae0, xl_xact_assignment_1 was ignored, so
xl_xact_assignment_1 would trigger the error.

> What I am trying to get at is if the problem can only occur by using
> serialized snapshots and we fix it by somehow not using them initial
> slot creation, then ideally we don't need to remove the error in
> ReorderBufferAssignChild, but if that is not the case, then we need to
> discuss other cases as well.

So, 1) and 3) mean this is not the case.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Butz 2020-02-05 11:06:41 Re: BUG #16241: Degraded hash join performance
Previous Message Wimmesberger, Simon 2020-02-05 08:58:14 AW: BUG #16103: Initdb does not respect country for language

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2020-02-05 09:50:05 Assumptions about the number of parallel workers
Previous Message Amit Langote 2020-02-05 08:54:29 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side