Re: Overhead cost of Serializable Snapshot Isolation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Sabino Mullane <greg(at)turnstep(dot)com>
Subject: Re: Overhead cost of Serializable Snapshot Isolation
Date: 2011-10-12 13:28:25
Message-ID: CA+TgmoY7YEt0fJtJb20wO+gT-Ffsz1LApe=W85DTPNLWrHxarQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2011 at 8:44 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> With such a switch, every application that relies on true serializability for
> correctness would be prone to silent data corruption should the switch ever
> get set to "off" accidentally.

Agreed.

> Without such a switch, OTOH, all that will happen are a few more aborts due to
> serialization errors in application who request SERIALIZABLE when they really
> only need REPEATABLE READ. Which, in the worst case, is a performance issue,
> but never an issue of correctness.

Right. And, in fairness:

1. The benchmark that I did was probably close to a worst-case
scenario for SSI. Since there are no actual writes, there is no
possibility of serialization conflicts, but the system must still be
prepared for the possibility of a write (and, thus, potentially, a
conflict) at any time. In addition, all of the transactions are very
short, magnifying the effect of transaction start and cleanup
overhead. In real life, people who have this workload are unlikely to
use serializable mode in the first place. The whole point of
serializability (not just SSI) is that it helps prevent anomalies when
you have complex transactions that could allow subtle serialization
anomalies to creep in. Single-statement transactions that read (or
write) values based on a primary key are not the workload where you
have that problem. You'd probably be happy to turn off MVCC
altogether if we had an option for that.

2. Our old SERIALIZABLE behavior (now REPEATABLE READ) is a pile of
garbage. Since Kevin started beating the drum about SSI, I've come
across (and posted about) situations where REPEATABLE READ read causes
serialization anomalies that don't exist at the READ COMMITTED level
(which is exactly the opposite of what is really supposed to happen -
REPEATABLE READ is supposed to provide more isolation, not less); and
Kevin's pointed out many situations where REPEATABLE READ utterly
fails to deliver serializable behavior. I'm not exactly thrilled with
these benchmark results, but going back to a technology that doesn't
work is not better. If individual users want to request that
defective behavior for their applications, I am fine with giving them
that option, and we have. But if people actually want serializability
and we given them REPEATABLE READ, then they're going to get wrong
behavior. The fact that we've been shipping that wrong behavior for
years and years for lack of anything better is not a reason to
continue doing it.

I agree with Tom's comment upthread that the best thing to do here is
put some effort into improving SSI. I think it's probably going to
perform adequately for the workloads where people actually need it,
but I'd certainly like to see us make it better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-10-12 13:29:06 Re: loss of transactions in streaming replication
Previous Message Andrew Dunstan 2011-10-12 13:13:20 Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME