Re: Fix for consume_xids advancing XIDs incorrectly

From: Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix for consume_xids advancing XIDs incorrectly
Date: 2024-10-18 05:57:36
Message-ID: 57b588db6017ba20e67980fefa388b86@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for your suggestion. Let me organize the discussion as
follows:

When I created the patch, I did not account for situations where other
transactions concurrently consume XIDs. If we establish the rule that
"consume_xids should not return an XID that is consumed by any other
transaction," I believe it will simplify the discussion.

Case: consume_xids(1)
Let's consider a scenario where transaction 1 (t1) and transaction 2
(t2) run concurrently. t1 calls txid_current() and consume_xids(1),
while t2 calls txid_current().

The schedule above illustates when t1 and t2 run concurrently. The "next
XID" column shows how the TransamVariables->nextXid global variable
changes over time. The /**/ blocks shows the value of each variable and
outcome of a called function.

In my previous patch, the schedule above could occur, violating the rule
because consume_xids does not update lastxid when GetTopTransactionId is
called.

If your suggestion is applied, consume_xids would no longer violate the
rule, as it would update lastxid when generating a new XID.

Case: consume_xids(0)

Now, consider the case where transaction 1 consumes 0 XIDs.

In this example, both t1 and t2 consume XID = 20, which violates the
rule.

Using ReadNextFullTransactionId() - 1 also does not resolve the issue,
as demonstrated above. This is because consume_xids(0) does not allocate
any XIDs.

In conclusion, I agree that:

* Update lastxid with GetTopTransactionId().
* consume_xids with 0 should raise an error.

I have attached a new patch that incorporates your suggestions.

Regards,
Yushi

On 2024-10-17 02:58, Masahiko Sawada wrote:

> On Tue, Oct 15, 2024 at 10:06 PM Yushi Ogiwara
> <btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
>> Hi,
>>
>> Thank you for your comment.
>>
>> Regarding the first patch, I believe it works correctly when
>> consume_xids(1) is called. This is because the lastxid variable in
>> the
>> consume_xids_common function is initialized as lastxid =
>> ReadNextFullTransactionId(), where the ReadNextFullTransactionId
>> function returns the (current XID) + 1.
>
> But it's possible that concurrent transactions consume XIDs in
> meanwhile, no?
>
>> Separately, I found that consume_xids(0) does not behave as expected.
>> Below is an example:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 45496
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45497
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45497
>> (1 row)
>>
>> In the example, the argument to consume_xids is 0, meaning it should
>> not
>> consume any XIDs. However, the first invocation of consume_xids(0)
>> looks
>> like unexpectedly consuming 1 XID though it's not consuming actually.
>> This happens because consume_xids(0) returns the value from
>> ReadNextFullTransactionId.
>>
>> I have updated the patch (skip.diff, attached to this e-mail) to
>> address
>> this issue. Now, when consume_xids(0) is called, it returns
>> ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as
>> shown below:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 45498
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45498
>> (1 row)
>
> Hmm, I think if we expect this function to return the last XID that
> the function actually consumed, calling consume_xids with 0 should
> raise an error instead. Even if it returns
> ReadNextFullTransactionId().value - 1 as you proposed, other
> concurrent transactions might consume XIDs between txid_current() and
> consume_xids(0), resulting in consume_xids() appearing to have
> consumed XIDs.
>
> Regards,

Attachment Content-Type Size
consume_xid_fix.diff text/x-diff 975 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seino Yuki 2024-10-18 06:00:50 Re: Add “FOR UPDATE NOWAIT” lock details to the log.
Previous Message Antonin Houska 2024-10-18 05:51:21 Re: [PoC] Federated Authn/z with OAUTHBEARER