Re: patch: fix race in SSI's CheckTargetForConflictsIn

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: fix race in SSI's CheckTargetForConflictsIn
Date: 2011-05-07 01:35:39
Message-ID: BANLkTik0J5n3XefoKuqSP1kRP2ZJALv2Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 5, 2011 at 1:43 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>> While running some benchmarks to test SSI performance, I found a
>> race condition that's capable of causing a segfault. A patch is
>> attached.
>>
>> The bug is in CheckTargetForConflictsIn, which scans the list of
>> SIREAD locks on a lock target when it's modified. There's an
>> optimization in there where the writing transaction will remove a
>> SIREAD lock that it holds itself, because it's being replaced with
>> a (stronger) write lock. To do that, it needs to drop its shared
>> lwlocks and reacquire them in exclusive mode. The existing code
>> deals with concurrent modifications in that interval by redoing
>> checks. However, it misses the case where some other transaction
>> removes all remaining locks on the target, and proceeds to remove
>> the lock target itself.
>>
>> The attached patch fixes this by deferring the SIREAD lock removal
>> until the end of the function. At that point, there isn't any need
>> to worry about concurrent updates to the target's lock list. The
>> resulting code is also simpler.
>
> I just want to confirm that this addresses a real (although very
> narrow) race condition.  It results from code used to implement a
> valuable optimization described in Cahill's thesis: don't get or
> keep an SIREAD lock on a row which has a write lock.  The write lock
> is stronger and will cause a write/write conflict with any
> overlapping transactions which would care about a read/write
> conflict.  The pattern of reading a row and then updating or
> deleting it is so common that this optimization does a lot to avoid
> promotion of predicate locks to coarser granularity, and thereby
> helps avoid false positives.
>
> While the optimization is valuable, the code used to implement it
> was pretty horrid.  (And that was me that wrote it.)  It has already
> been the cause of several other fixes since the main patch went in.
> What Dan has done here is move the optimization out of the middle of
> the loop which is doing the conflict detection, and in doing so has
> reduced the number of lines of code needed, reduced the amount of
> fiddling with LW locks, and all around made the code more robust and
> more understandable.
>
> I've reviewed the patch and it looks good to me.  Dan has beat up on
> it with the same DBT-2 run which exposed the race condition without
> seeing a problem.  Although a much smaller patch could address the
> immediate problem, I strongly feel that Dan has taken the right
> approach by refactoring this bit to something fundamentally cleaner
> and less fragile.

Why does this HASH_FIND the applicable hash table entries and then
HASH_REMOVE it as a separate step, instead of just HASH_REMOVE-ing it
in one go?

Doesn't this fail to release the locks if rmpredlock == NULL?

I believe it's project style to test (rmpredlock != NULL) rather than
just (rmpredlock).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-07 01:50:48 Re: kill -KILL: What happens?
Previous Message Robert Haas 2011-05-07 01:27:59 Re: Why not install pgstattuple by default?