I want to try to get agreement that it would be a good idea to
implement serializable transactions, and what that would look like
from the user side. At this point, we should avoid discussions of
whether it's possible or how it would be implemented, but focus on
what that would look like and whether it would be desirable.
Let's start with reasons:
(1) The standard has always required that the serializable
transaction isolation mode be supported. We don't comply with recent
versions of the standard, which have changed the definition of this
mode to go beyond the four specific anomalies mentioned, and now
requires that any execution of concurrent serializable transactions
must yield results consistent with some serial execution of those
transactions. Being able to show compliance with a significant point
in the standard has value, all by itself.
(2) The standard requires this because it is the only cost-effective
way to ensure data integrity in some environments, particularly those
with a large number of programmers, tables, and queries; and which
have complex data integrity rules. Basically, any serializable
transaction which can be shown to do the right thing when run by
itself will automatically, with no additional development effort, do
the right thing when run in any arbitrary mix of concurrent
transactions. This feature would be likely to make PostgreSQL a
viable option in some shops where it currently isn't.
(3) Many other database products provide serializable transactions,
including DB2, Microsoft SQL Server, and Sybase ASE. Some MVCC
databases, like recent Microsoft SQL Server releases, allow the user
to choose snapshot isolation or full serializable isolation.
(4) It may simplify the code to implement PostgreSQL foreign key
constraints and/or improve concurrency in the face of such
constraints.
(5) It may simplify application code written for PostgreSQL and
improve concurrency of transactions with possible conflicts, since
explicit locks will not need to be taken, and blocking currently
resulting from explicit locks can be eliminated.
Proposed user visible aspects are:
(A) Well known anomalies possible under snapshot isolation will not
be possible among transactions running at the serializable transaction
isolation level, with no need to explicitly take locks to prevent
them.
(B) While no blocking will occur between reads and writes, certain
combinations of reads and writes will cause a rollback with a SQLSTATE
which indicates a serialization failure. Any transaction running at
this isolation level must be prepared to deal with these.
(C) One or more GUCs will be added to control whether the new
behavior is used when serializable transaction isolation is requested
or whether, for compatibility with older PostgreSQL releases, the
transaction actually runs with snapshot isolation. In any event, a
request for repeatable read mode will provide the existing snapshot
isolation mode.
(D) It may be desirable to use these techniques, rather than current
techniques, to enforce the referential integrity specified by foreign
keys. If this is done, enforcement would produce less blocking, but
might increase rollbacks due to serialization failures. Perhaps this
should be controlled by a separate GUC.
(E) Since there will be a trade-off between the overhead of finer
granularity in tracking locks and the reduced number of rollbacks at a
finer granularity, it might be desirable to have a GUC to control
default granularity and a table property which can override the
default for individual tables. (In practice, with a different
database product which supported something like this, we found our
best performance with page level locks on all but a few small,
frequently updated tables -- which we set to row level locking.)
(F) Databases clusters making heavy use of serializable transactions
would need to boost the number of locks per transaction.
Thoughts?
-Kevin