Re: Skip collecting decoded changes of already-aborted transactions

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Skip collecting decoded changes of already-aborted transactions
Date: 2024-11-13 03:28:55
Message-ID: CALDaNm06VtK_9C1nLBMXNWQKtsX68--pzueHFXtdoUqfHGAX1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Sawada-San, here are some review comments for the patch v5-0001.
> >
>
> Thank you for reviewing the patch!
>
> > ======
> > Commit message.
> >
> > 1.
> > This commit introduces an additional check to determine if a
> > transaction is already aborted by a CLOG lookup, so the logical
> > decoding skips further change also when it doesn't touch system
> > catalogs.
> >
> > ~
> >
> > Is that wording backwards? Is it meant to say:
> >
> > This commit introduces an additional CLOG lookup check to determine if
> > a transaction is already aborted, so the ...
>
> Fixed.
>
> >
> > ======
> > contrib/test_decoding/sql/stats.sql
> >
> > 2
> > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> > spill_count FROM pg_stat_replication_slots WHERE slot_name =
> > 'regression_slot_stats4_twophase';
> >
> > Why do the SELECT "= 0" like this, instead of just having zeros in the
> > "expected" results?
>
> Indeed. I used "=0" like other queries in the same file do, but it
> makes sense to me just to have zeros in the expected file. That way,
> it would make it a bit easier to investigate in case of failures.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > 3.
> > static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > - bool txn_prepared);
> > + bool txn_prepared, bool mark_streamed);
> >
> > That last parameter name ('mark_streamed') does not match the same
> > parameter name in this function's definition.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferTruncateTXN:
> >
> > 4.
> > if (txn_streaming && (!txn_prepared) &&
> > (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > txn->txn_flags |= RBTXN_IS_STREAMED;
> >
> > if (txn_prepared)
> > {
> > ~
> >
> > Since the following condition was already "if (txn_prepared)" would it
> > be better remove the "(!txn_prepared)" here and instead just refactor
> > the code like:
> >
> > if (txn_prepared)
> > {
> > ...
> > }
> > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > {
> > ...
> > }
>
> Good idea.
>
> >
> > ~~~
> >
> > ReorderBufferProcessTXN:
> >
> > 5.
> > +
> > + /* Remember the transaction is aborted */
> > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> > + curtxn->txn_flags |= RBTXN_IS_ABORTED;
> >
> > Missing period on comment.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferCheckTXNAbort:
> >
> > 6.
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction. This is to disable this check for regression tests.
> > + */
> > +static bool
> > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > +{
> > + /*
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction.
> > + */
> > + if (unlikely(debug_logical_replication_streaming ==
> > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> > + return false;
> > +
> >
> > The wording of the sentence "This is to disable..." seemed a bit
> > confusing. Maybe this area can be simplified by doing the following.
> >
> > 6a.
> > Change the function comment to say more like below:
> >
> > When the GUC 'debug_logical_replication_streaming' is set to
> > "immediate", we don't check the transaction status, meaning the caller
> > will always process this transaction. This mode is used by regression
> > tests to avoid unnecessary transaction status checking.
> >
> > ~
> >
> > 6b.
> > It is not necessary for this 2nd comment to repeat everything that was
> > already said in the function comment. A simpler comment here might be
> > all you need:
> >
> > SUGGESTION:
> > Quick return for regression tests.
>
> Agreed with the above two comments. Fixed.
>
> >
> > ~~~
> >
> > 7.
> > Is it worth mentioning about this skipping of the transaction status
> > check in the docs for this GUC? [1]
>
> If we want to mention this optimization in the docs, we have to
> explain how the optimization works too. I think it's too detailed.
>
> I've attached the updated patch.

Few minor suggestions:
1) Can we use rbtxn_is_committed here?
+ /* Remember the transaction is aborted. */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

2) Similarly here too:
+ /*
+ * Mark the transaction as aborted so we ignore future changes of this
+ * transaction.
+ */
+ Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ txn->txn_flags |= RBTXN_IS_ABORTED;

3) Can we use rbtxn_is_aborted here?
+ /*
+ * Remember the transaction is committed so that we
can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0);

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-11-13 03:30:47 Re: Virtual generated columns
Previous Message Masahiro Ikeda 2024-11-13 03:14:08 Re: Avoiding superfluous buffer locking during nbtree backwards scans