Re: Potential G2-item cycles under serializable isolation

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kyle Kingsbury <aphyr(at)jepsen(dot)io>, Peter Geoghegan <pg(at)bowt(dot)ie>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential G2-item cycles under serializable isolation
Date: 2020-06-11 22:14:47
Message-ID: CA+hUKGK2H_RdJX5uYGSTkkazb2Ct=dFXVWcVxBsPk3+Za6790Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jun 12, 2020 at 7:30 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > + Access to the system catalogs is not done using the isolation level
> > + of the current transaction. This has the effect that newly created
> > + database objects such as tables become visible to concurrent Repeatable
> > + Read and Serializable transactions, even though their contents does not.

> Should it be pointed out that that's about *internal* accesses to
> catalogs? This could be understood to apply to a SELECT * FROM pg_class;
> where it actually doesn't apply?

Yeah. I hesitated about writing that because they don't exactly use
the current isolation level in its full glory, they just use the
current snapshot. Specifically, predicate locks are not taken on
catalogs. That seemed a bit beyond my personal murkiness threshold
for inclusion in the manual, but let me see if I can come up with the
right weasel words. Attached, but also inline here for review:

+ <para>
+ Internal access to the system catalogs is not done using the isolation
+ level of the current transaction. This means that newly created
+ database objects such as tables are visible to concurrent Repeatable Read
+ and Serializable transactions, even though the rows they contain are not.
+ In contrast, queries that explicitly examine the system catalogs don't
+ see rows representing concurrently created database objects in the higher
+ isolation levels.
+ </para>

On Fri, Jun 12, 2020 at 7:48 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2020-06-11 12:30:23 -0700, Andres Freund wrote:
> > Could it be worthwhile to narrow the "differences in behaviour" bit to
> > read-write transactions? IME the biggest reason people explicitly use RR
> > over RC is to avoid phantom reads in read-only transactions. Seems nicer
> > to not force users to read an academic paper to figure that out?
>
> But, on second thought, it might be too difficult to phrase this
> concisely and correctly, given the annoying issue of SI allowing for
> read-only transactions to observe violations of serializability. I don't
> think that's a RR violation, but maybe it could be understood as being
> about serializability too easily?

The goal of this new paragraph is to highlight that there is a
difference between traditional System R-style RR (as assumed but not
required by the SQL spec, see disputes) and SI. RC vs RR etc is
covered elsewhere.

I agree that it's way over the top to go into the gory details in our
manual, as Kyle suggested, though; my goal here is just to flag up
this key point in as few words as possible. Kyle is of course right:
it might affect someone coming from another database product or
database theory textbook. So, we should state the proper name of the
exact technique we are using, and refer the curious to primary sources
for more.

Here's a new attempt at that. Attached, but I'll also just include
the new paragraph here because it's short:

+ <para>
+ The Repeatable Read isolation level is implemented using a technique
+ known in academic database literature and in some other database products
+ as <firstterm>Snapshot Isolation</firstterm>. Differences in behavior
+ and performance may be observed when compared with systems using a more
+ traditional technique that permits less concurrency, and some other
+ systems may even offer Repeatable Read and Snapshot Isolation as distinct
+ isolation levels. The allowable phenomena that can be used to distinguish
+ the two techniques were not formalized by database researchers until after
+ the SQL Standard was developed, and are outside the scope of this manual.
+ For a full treatment, please see <xref linkend="berenson95"/>.
+ </para>

I also tweaked the new SSI para based on Kyle's feedback, to mention
performance. Kevin and Dan's paper, and plenty more papers referenced
from there, cover S2PL vs SSI pretty well if you want to know more
about that. So I think that external reference is the right way to go
here too.

+ <para>
+ The Serializable isolation level is implemented using a technique
+ known in academic database literature as Serializable Snapshot Isolation,
+ which builds on Snapshot Isolation by adding checks for serialization
+ anomalies. Some differences in behavior and performance may be observed
+ when compared with systems using other techniques. Please see
+ <xref linkend="gritt2012"/> for detailed information.
+ </para>

Attachment Content-Type Size
v2-0001-Doc-Add-references-for-SI-and-SSI.patch text/x-patch 5.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2020-06-11 22:51:42 Re: Potential G2-item cycles under serializable isolation
Previous Message legrand legrand 2020-06-11 20:33:36 RE: pg_stat_statements: duplicated external query texts with MSY2