From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: patch: fix race in SSI's CheckTargetForConflictsIn |
Date: | 2011-05-07 01:35:39 |
Message-ID: | BANLkTik0J5n3XefoKuqSP1kRP2ZJALv2Sw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 5, 2011 at 1:43 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>> While running some benchmarks to test SSI performance, I found a
>> race condition that's capable of causing a segfault. A patch is
>> attached.
>>
>> The bug is in CheckTargetForConflictsIn, which scans the list of
>> SIREAD locks on a lock target when it's modified. There's an
>> optimization in there where the writing transaction will remove a
>> SIREAD lock that it holds itself, because it's being replaced with
>> a (stronger) write lock. To do that, it needs to drop its shared
>> lwlocks and reacquire them in exclusive mode. The existing code
>> deals with concurrent modifications in that interval by redoing
>> checks. However, it misses the case where some other transaction
>> removes all remaining locks on the target, and proceeds to remove
>> the lock target itself.
>>
>> The attached patch fixes this by deferring the SIREAD lock removal
>> until the end of the function. At that point, there isn't any need
>> to worry about concurrent updates to the target's lock list. The
>> resulting code is also simpler.
>
> I just want to confirm that this addresses a real (although very
> narrow) race condition. It results from code used to implement a
> valuable optimization described in Cahill's thesis: don't get or
> keep an SIREAD lock on a row which has a write lock. The write lock
> is stronger and will cause a write/write conflict with any
> overlapping transactions which would care about a read/write
> conflict. The pattern of reading a row and then updating or
> deleting it is so common that this optimization does a lot to avoid
> promotion of predicate locks to coarser granularity, and thereby
> helps avoid false positives.
>
> While the optimization is valuable, the code used to implement it
> was pretty horrid. (And that was me that wrote it.) It has already
> been the cause of several other fixes since the main patch went in.
> What Dan has done here is move the optimization out of the middle of
> the loop which is doing the conflict detection, and in doing so has
> reduced the number of lines of code needed, reduced the amount of
> fiddling with LW locks, and all around made the code more robust and
> more understandable.
>
> I've reviewed the patch and it looks good to me. Dan has beat up on
> it with the same DBT-2 run which exposed the race condition without
> seeing a problem. Although a much smaller patch could address the
> immediate problem, I strongly feel that Dan has taken the right
> approach by refactoring this bit to something fundamentally cleaner
> and less fragile.
Why does this HASH_FIND the applicable hash table entries and then
HASH_REMOVE it as a separate step, instead of just HASH_REMOVE-ing it
in one go?
Doesn't this fail to release the locks if rmpredlock == NULL?
I believe it's project style to test (rmpredlock != NULL) rather than
just (rmpredlock).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-05-07 01:50:48 | Re: kill -KILL: What happens? |
Previous Message | Robert Haas | 2011-05-07 01:27:59 | Re: Why not install pgstattuple by default? |