From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Fix premature xmin advancement during fast forward decoding |
Date: | 2025-04-24 09:57:56 |
Message-ID: | CAA4eK1+poGw0LzyHArA6mGmEkRtgkXCJoT4Whc29xPMXyxX=1A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> When analyzing some issues in another thread[1], I found a bug that fast forward
> decoding could lead to premature advancement of catalog_xmin, resulting in
> required catalog data being removed by vacuum.
>
> The issue arises because we do not build a base snapshot when decoding changes
> during fast forward mode, preventing reference to the minimum transaction ID
> that remains visible in the snapshot when determining the candidate for
> catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
> running transaction ID found in the latest running_xacts record.
>
> In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not
> be reached during fast forward decoding, resulting in
> rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
> system attempts to refer to rb->txns_by_base_snapshot_lsn via
> ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
> NULL, it defaults to directly using running->oldestRunningXid as the candidate
> for catalog_xmin. For reference, see the details in
> SnapBuildProcessRunningXacts.
>
> See the attachment for a test(0002) to prove that the catalog data that are
> still required would be removed after premature catalog_xmin advancement during
> fast forward decoding.
>
> To fix this, I think we can allow the base snapshot to be built during fast
> forward decoding, as implemented in the patch 0001 (We already built base
> snapshot in fast-forward mode for logical message in logicalmsg_decode()).
>
The same bug was fixed for non-fast_forward mode in commit f49a80c4.
See the following code in that commit:
- LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
+ xmin = ReorderBufferGetOldestXmin(builder->reorder);
+ if (xmin == InvalidTransactionId)
+ xmin = running->oldestRunningXid;
....
+ LogicalIncreaseXminForSlot(lsn, xmin);
Is my understanding correct? If so, then I think it is a miss in the
commit f49a80c4 to consider fast_forward mode. Please refer commit in
the commit message.
The code changes look correct to me. I have some suggestions related
to comments in the code. See attached. Please prepare patches for back
branches as well.
--
With Regards,
Amit Kapila.
Attachment | Content-Type | Size |
---|---|---|
v1_comments_amit.1.patch.txt | text/plain | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2025-04-24 10:21:06 | Re: extension_control_path and "directory" |
Previous Message | Arseniy Mukhin | 2025-04-24 09:53:16 | Possible incorrect comment in ginget.c |