Re: Potential G2-item cycles under serializable isolation

From: Kyle Kingsbury <aphyr(at)jepsen(dot)io>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential G2-item cycles under serializable isolation
Date: 2020-06-01 03:37:37
Message-ID: 11bb3199-c685-1cec-63bb-f92848edbe10@jepsen.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 5/31/20 11:04 PM, Peter Geoghegan wrote:
> We generally like to produce tests for SSI, ON CONFLICT DO UPDATE, and
> anything else involving concurrent behavior using something called isolation
> tester: https://github.com/postgres/postgres/tree/master/src/test/isolation We
> may end up writing an isolation test for the issue you reported as part of an
> eventual fix. You might find it helpful to review some of the existing tests.

Ah, wonderful! I don't exactly know how to plug Elle's history analysis into
this, but I think it... should be possible to write down some special cases
based on the histories I've seen.

> Could you test Postgres 9.5? It would be nice to determine if this is
> a new issue, or a regression.

I'll look into that tomorrow morning! :)

I, uh, backed off to running these tests at read committed (which, uh... should
be SI, right?) and found what appear to be scads of SI violations, including
read skew and even *internal* consistency anomalies. Read-only transactions
can... apparently... see changing values of a record? Here's a single
transaction which read key 21, got [1], then read key 21 again, and saw [1 2 3]:

  [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

See
http://jepsen.io.s3.amazonaws.com/analyses/postgresql-12.3/20200531T223558.000-0400.zip
-- jepsen.log from 22:36:09,907 to 22:36:09,909:

  2020-05-31 22:36:09,907{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
  :invoke :txn    [[:r 21 nil] [:r 20 nil] [:r 20 nil] [:r 21 nil]]

  ...

  2020-05-31 22:36:09,909{GMT}    INFO    [jepsen worker 6] jepsen.util: 6
  :ok     :txn    [[:r 21 [1]] [:r 20 [1 2]] [:r 20 [1 2]] [:r 21 [1 2 3]]]

You can fire up wireshark and point it at the pcap file in n1/ to
double-check--try `tcp.stream eq 4`. The BEGIN statement for this transaction is
at 22:36:09.908115. There are a bunch more anomalies called out in analysis.edn,
if it's helpful.

This looks so weird that I assume I've *got* to be doing it wrong, but trawling
through the source code and pcap trace, I can't see where the mistake is. Maybe
I'll have fresher eyes in the morning. :)

Sincerely,

--Kyle

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2020-06-01 03:56:03 Re: Potential G2-item cycles under serializable isolation
Previous Message Peter Geoghegan 2020-06-01 03:04:14 Re: Potential G2-item cycles under serializable isolation