From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Race condition in GetOldestActiveTransactionId() |
Date: | 2017-07-13 12:52:44 |
Message-ID: | e4c4e884-0c00-2f44-2a20-1b29a031349d@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08/22/2016 01:46 PM, Heikki Linnakangas wrote:
> While hacking on the CSN patch, I spotted a race condition between
> GetOldestActiveTransactionId() and GetNewTransactionId().
> GetOldestActiveTransactionId() calculates the oldest XID that's still
> running, by doing:
>
> 1. Read nextXid, without a lock. This is the upper bound, if there are
> no active XIDs in the proc array.
> 2. Loop through the proc array, making note of the oldest XID.
>
> While GetNewTransactionId() does:
>
> 1. Read and Increment nextXid
> 2. Set MyPgXact->xid.
>
> It seems possible that if you call GetNewTransactionId() concurrently
> with GetOldestActiveTransactionId(), GetOldestActiveTransactionId() sees
> the new nextXid value that the concurrent GetNewTransactionId() set, but
> doesn't see the old XID in the proc array. It will return a value that
> doesn't cover the old XID, i.e. it won't consider the just-assigned XID
> as in-progress.
>
> Am I missing something? Commit c6d76d7c added a comment to
> GetOldestActiveTransactionId() explaining why it's not necessary to
> acquire XidGenLock there, but I think it missed the above race condition.
>
> GetOldestActiveTransactionId() is not performance-critical, it's only
> called when performing a checkpoint, so I think we should just bite the
> bullet and grab the lock. Per attached patch.
I did some testing, and was able to indeed construct a case where
oldestActiveXid was off-by-one in an online checkpoint record. However,
looking at how it's used, I think it happened to not have any
user-visible effect. The oldestActiveXid value of an online checkpoint
is only used to determine the point where pg_subtrans is initialized,
and the XID missed in the computation could not be a subtransaction.
Nevertheless, it's clearly a bug and the fix is simple, so I committed a
fix.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Marina Polyakova | 2017-07-13 14:34:14 | Re: WIP Patch: Pgbench Serialization and deadlock errors |
Previous Message | Andrew Dunstan | 2017-07-13 12:34:28 | Re: pl/perl extension fails on Windows |