Quick Links

Re: patch: fix race in SSI's CheckTargetForConflictsIn

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: patch: fix race in SSI's CheckTargetForConflictsIn
Date:	2011-05-07 01:35:39
Message-ID:	BANLkTik0J5n3XefoKuqSP1kRP2ZJALv2Sw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, May 5, 2011 at 1:43 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>> While running some benchmarks to test SSI performance, I found a
>> race condition that's capable of causing a segfault. A patch is
>> attached.
>>
>> The bug is in CheckTargetForConflictsIn, which scans the list of
>> SIREAD locks on a lock target when it's modified. There's an
>> optimization in there where the writing transaction will remove a
>> SIREAD lock that it holds itself, because it's being replaced with
>> a (stronger) write lock. To do that, it needs to drop its shared
>> lwlocks and reacquire them in exclusive mode. The existing code
>> deals with concurrent modifications in that interval by redoing
>> checks. However, it misses the case where some other transaction
>> removes all remaining locks on the target, and proceeds to remove
>> the lock target itself.
>>
>> The attached patch fixes this by deferring the SIREAD lock removal
>> until the end of the function. At that point, there isn't any need
>> to worry about concurrent updates to the target's lock list. The
>> resulting code is also simpler.
>
> I just want to confirm that this addresses a real (although very
> narrow) race condition. It results from code used to implement a
> valuable optimization described in Cahill's thesis: don't get or
> keep an SIREAD lock on a row which has a write lock. The write lock
> is stronger and will cause a write/write conflict with any
> overlapping transactions which would care about a read/write
> conflict. The pattern of reading a row and then updating or
> deleting it is so common that this optimization does a lot to avoid
> promotion of predicate locks to coarser granularity, and thereby
> helps avoid false positives.
>
> While the optimization is valuable, the code used to implement it
> was pretty horrid. (And that was me that wrote it.) It has already
> been the cause of several other fixes since the main patch went in.
> What Dan has done here is move the optimization out of the middle of
> the loop which is doing the conflict detection, and in doing so has
> reduced the number of lines of code needed, reduced the amount of
> fiddling with LW locks, and all around made the code more robust and
> more understandable.
>
> I've reviewed the patch and it looks good to me. Dan has beat up on
> it with the same DBT-2 run which exposed the race condition without
> seeing a problem. Although a much smaller patch could address the
> immediate problem, I strongly feel that Dan has taken the right
> approach by refactoring this bit to something fundamentally cleaner
> and less fragile.

Why does this HASH_FIND the applicable hash table entries and then
HASH_REMOVE it as a separate step, instead of just HASH_REMOVE-ing it
in one go?

Doesn't this fail to release the locks if rmpredlock == NULL?

I believe it's project style to test (rmpredlock != NULL) rather than
just (rmpredlock).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: patch: fix race in SSI's CheckTargetForConflictsIn at 2011-05-05 17:43:56 from Kevin Grittner

Responses

Re: patch: fix race in SSI's CheckTargetForConflictsIn at 2011-05-07 02:49:22 from Dan Ports

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2011-05-07 01:50:48	Re: kill -KILL: What happens?
Previous Message	Robert Haas	2011-05-07 01:27:59	Re: Why not install pgstattuple by default?