Re: WIP: Detecting SSI conflicts before reporting constraint violations

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Detecting SSI conflicts before reporting constraint violations
Date: 2016-02-15 22:03:46
Message-ID: CACjxUsNzR0GsLeKYG+pqnPJGtqmGpAH11SbEv3BdCYeJsW4c4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 3, 2016 at 5:12 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> I don't see it as a difficult choice between two reasonable
> alternatives. It quacks suspiciously like a bug.

That seems a little strong to me; I think it would be an
unacceptable change in behavior to back-patch this, for example.
On the other hand, we have had multiple reports on these lists
asserting that the behavior is a bug (not to mention several
off-list communications to me about it), it seems like a POLA
violation, it hides the information that users of serializable
transactions consider most important in favor of relatively
insignificant (to them) details about what table and key were
involved, and it causes errors to be presented to end users that
the developers would prefer to be handled discretely in the
background. The current behavior provides this guarantee:

"Any set of successfully committed concurrent serializable
transactions will provide a result consistent with running them one
at a time in some order."

Users of serializable transactions would, in my experience,
universally prefer to strengthen that guarantee with:

"Should a serializable transaction fail only due to the action of a
concurrent serializable transaction, it should fail with a
serialization failure error."

People have had to resort to weird heuristics like performing a
limited number of retries on a duplicate key error in case it
happens to be due to a serialization problem, but that wastes
resources when it is not a serialization failure, and unnecessarily
complicates the retry framework.

> The theoretical problem with the current behaviour is that by
> reporting a unique constraint violation in this case, we reach an
> outcome that is neither a serialization failure nor a result that
> could occur in any serial ordering of transactions.

Well, not if you only consider successfully committed transactions. ;-)

> The overlapping
> transactions both observed that the key they planned to insert was not
> present before inserting, and therefore they can't be untangled: there
> is no serial order of the transactions where the second transaction to
> run wouldn't see the key already inserted by the first transaction and
> (presumably) take a different course of action. (If it *does* see the
> value already present in its snapshot, or doesn't even look first
> before inserting and it turns out to be present, then it really
> *should* get a unique constraint violation when trying to insert.)
>
> The practical problem with the current behaviour is that the user has
> to work out whether a unique constraint violation indicates:
>
> 1. A programming error -- something is wrong that retrying probably won't fix
>
> 2. An unfavourable outcome in the case that you speculatively
> inserted without checking whether the value was present so you were
> expecting a violation to be possible, in which case you know what
> you're doing and you can figure out what to do next, probably retry or
> give up
>
> 3. A serialization failure that has been masked because our coding
> happens to check for unique constraint violations without considering
> SSI conflicts first -- retrying will eventually succeed.
>
> It's complicated for a human to work out how to distinguish the third
> category errors in each place where they might occur (and even to know
> that they are possible, since AFAIK the manual doesn't point it out),
> and impossible for an automatic retry-on-40001 framework to handle in
> general. SERIALIZABLE is supposed to be super easy to use (and
> devilishly hard to implement...).

This is exactly on the mark, IMO.

FWIW, at the conference in Moscow I had people for whom this is
their #1 feature request. (Well, actually, they also argued it
should be considered a bug fix; but on argument agreed that the
current guarantee is useful and operating as designed, so were
willing to see it treated as an enhancement.)

Another way of stating the impact of this patch is that it changes
the guarantee to:

"If you write a transaction so that it does the right thing when
run alone, it will always do the right thing as part of any mix of
serializable transactions or will fail with a serialization failure
error."

Right now we have to add:

"... or, er, maybe a duplicate key error."

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-15 22:27:47 Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Previous Message Tom Lane 2016-02-15 21:29:26 Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.