>>> Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
>> As I've understood limitations of the PostgreSQL implementation of
>> SERIALIZABLE transactions, at least the only example given in the
>> documentation, revolve around a rather unlikely situation:
>>
>> Given concurrent transactions T1 and T2 and non-overlapping sets of
>> data A and B, T1 reads data including A and uses the data to modify
B
>> while T2 reads data including B and uses that data to modify A,
where
>> the modifications performed by either would affect the
modifications
>> made by the other, if they were visible.
>
> In so far as the "modifications" are just INSERTs (no UPDATEs or
> DELETEs), yes. This case is covered in the documentation.
Let's not understate the scope of the issue.
UPDATEs and DELETEs count, too. For example:
-- connection 1
cir=> create table mytab (class int not null, value int not null);
CREATE TABLE
cir=> copy mytab from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1 10
>> 1 20
>> 2 100
>> 2 200
>> \.
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 2) where class = 1 and value = 10;
UPDATE 1
-- connection 2
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 1) where class = 2 and value = 100;
UPDATE 1
cir=> commit transaction;
COMMIT
-- connection 1
cir=> commit transaction;
COMMIT
cir=> select * from mytab;
class | value
-------+-------
1 | 20
2 | 200
1 | 300
2 | 30
(4 rows)
>> Imagine, as an example, a system which involves recording receipts,
>> each of which must go into a daily deposit. There is a control
table
>> with one row containing the current deposit date for receipts.
>> Somewhere mid-afternoon that date is updated, all subsequent
receipts
>> fall into the new day, and a report is run listing the receipts for
the
>> day and giving the deposit total.
>
> This is a variation of the above and has the same "proper" solution:
> predicate locking. However, in this case the records in question are
> already present so you can workaround it easily. First do a SELECT
FOR
> UPDATE on all the records you want to update. This will serialize
all
> parallel transactions to either before or after you. Then do your
> update.
My point isn't that there aren't workarounds, it is that people might
reasonably assume that SERIALIZABLE transactions provide sufficient
concurrency control for this, since the only example we give of a
problem is a rather contrived update anomaly. The fact that even in
cases where the data settles into good form at commit leave windows
where race conditions could cause occasional bad results without extra
explicit locking is not obvious.
>> This absolutely can't happen in a standard-compliant
implementation.
>> At a minimum, this window where visible data lacks coherency should
be
>> noted in the documentation. I don't know if there's any way to fix
>> this without killing performance.
>
> Predicate locking is nasty and we don't try. I'm not sure if anybody
> else does.
I know for a fact that Sybase ASE does. I've heard from reasonably
reliable sources that DB2 does. I know that Microsoft SQL Server did
for some time after the split from the Sybase code base, but I'm not
sure they've continued that; in fact there was a reference to
concurrency issues in wikipedia which implied that they no longer do.
The implementation is not pretty -- they do it by locking accessed
pages and insertion points, and in some cases entire tables. It is so
ugly that at one point in discussing similar issues Tom said that it
couldn't really qualify as predicate locking, but in the face of the
fact that they have covered all the bases to provide true serializable
transactions, and that theory says that only predicate locking can do
that, he conceded that it was predicate locking -- but really ugly and
in a form he would never support.
Anyway, I didn't argue that we should provide truly serializable
transactions, just that we should provide a less contrived example of
where the PostgreSQL implementation can show anomalies, so that people
don't get burned through a false sense of security.
-Kevin