Re: [GENERAL] Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

From: Alexandru Lazarev <alexandru(dot)lazarev(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>, david(dot)g(dot)johnston(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [GENERAL] Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)
Date: 2016-03-10 19:50:48
Message-ID: CAL93h0GmoR00Vr_WeHoC-mX9UWghGpHY=cp3DpbJ++LaBgSnXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

Ok,
Now it's more clear for me. Thanks to all, especially on @Kevin deep
explanation and (tx3) example. Question is closed.

I intuited that it might be as all of You explained, but was not sure, I
was confused by "predicate lock" - I thought it's related to `SELECT+WHERE`
and not to data (a kind of "subject lock").
Now I understood that key-words are "serial execution in any order will
lead to conflict" - here I also was a little bit confused by chronological
order of commit.

P.S. One more "offtop" question - What kind of frameworks do automatically
retries for failed transactions? Are Hibernate/Spring in that list?

Best Regards,
AlexL

On Thu, Mar 10, 2016 at 12:41 AM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:

> On Wed, Mar 9, 2016 at 1:39 PM, Alexandru Lazarev
> <alexandru(dot)lazarev(at)gmail(dot)com> wrote:
>
> Jeff's answer is entirely correct; I'm just going to go into more
> detail -- just in case you're interested enough to work through it.
>
> > `CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`
> >
> > and following data
> >
> > id | mynum
> > ----+-------
> > 1 | 10
> > 2 | 10
> > 3 | 10
> > 4 | 10
> > (4 rows)
> >
> > I run 2 serialize transactions in parallel (2 `psql` consoles):
> >
> > -- both transactions
> > mydb=# begin;
> > BEGIN
> > mydb=# set transaction isolation level serializable;
> > SET
> >
> > -- tx1
> > mydb=# select * from foo where mynum < 100;
> > id | mynum
> > ----+-------
> > 1 | 10
> > 2 | 10
> > 3 | 10
> > 4 | 10
> > (4 rows)
> > --tx1: Shouldn't freeze data visible for tx1 select?
>
> Yes, tx1 does have a snapshot which will guarantee that it sees a
> repeatable set view of the data for this predicate.
>
> > --tx2
> > mydb=# insert into foo (mynum) values (10);
> > INSERT 0 1
> > -- tx2 will insert next row with id 5 in foo table
> > -- Shouldn't insert of tx2 broke data snapshot visible for tx1?
>
> The snapshot tx1 has guarantees that overlapping changes won't
> change it's view of things, and there is no reason for anything to
> be blocked or canceled here. The insert creates what is called a
> read-write dependency (or rw-conflict for short) that establishes
> that in any serial ordering of a set of transactions which includes
> tx1 and tx2, tx1 must precede tx2 in the apparent order of
> execution.
>
> > --tx1
> > mydb=# update foo set mynum = 20 where id < 100;
> > UPDATE 4
> > -- Shouldn't here appear serialization fail or at least on tx1
> commit?
>
> No, there is no cycle in the apparent order of execution. The
> snapshot for tx1 still limits it to the same set of rows, and there
> is nothing visible that is inconsistent with tx1 running before
> tx2.
>
> > --tx2
> > mydb=# commit;
> > COMMIT
> >
> > --tx1
> > mydb=# commit;
> > COMMIT
> > -- tx1 Commit is OK - no any error
>
> According to the SQL standard, and in the PostgreSQL implementation
> of SERIALIZABLE transactions, commit order does not, by itself,
> establish apparent order of execution.
>
> > -- implicit tx
> > mydb=# select * from foo;
> > id | mynum
> > ----+-------
> > 1 | 20
> > 2 | 20
> > 3 | 20
> > 4 | 20
> > 5 | 10
> > (5 rows)
>
> As Jeff said, this is consistent with the implicit transaction
> running last, so tx1 -> tx2 -> implicit_tx.
>
> Now, you are pretty close to a situation which does need to trigger
> a serialization failure -- just switch the commit of tx1 and the
> implicit transaction. If tx2 has committed but tx1 has not yet
> committed:
>
> mydb=# select * from foo;
> id | mynum
> ----+-------
> 1 | 10
> 2 | 10
> 3 | 10
> 4 | 10
> 5 | 10
> (5 rows)
>
> *Now* we have a problem -- this only makes sense if the implicit tx
> was run after tx2 and before tx1. So apparent order of execution
> is tx1 -> tx2 -> implicit_tx -> tx1. There is a cycle in the
> apparent order of execution, which causes anomalies which can ruin
> data integrity. Now, if the implicit transaction is not
> serializable, it is allowed to see such things, but if you make it
> serializable (and let's call it tx3 now) it sees a state where only
> tx2 ran; tx1 could not have run:
>
> -- tx3
> mydb=# begin;
> BEGIN
> mydb=# set transaction isolation level serializable;
> SET
> mydb=# select * from foo;
> id | mynum
> ----+-------
> 1 | 10
> 2 | 10
> 3 | 10
> 4 | 10
> 5 | 10
> (5 rows)
>
> mydb=# commit;
> COMMIT
>
> So now, tx1 is not allowed to commit, or for that matter do
> anything else -- it has been "doomed" by tx3:
>
> mydb=# select * from foo;
> ERROR: could not serialize access due to read/write dependencies
> among transactions
> DETAIL: Reason code: Canceled on identification as a pivot, during
> conflict out checking.
> HINT: The transaction might succeed if retried.
>
> Hopefully you are using some framework to automatically detect this
> SQLSTATE and retry the transaction from the start. So on retry,
> tx1 does this:
>
> -- tx1 (retry)
> mydb=# begin;
> BEGIN
> mydb=# set transaction isolation level serializable;
> SET
> mydb=# select * from foo where mynum < 100;
> id | mynum
> ----+-------
> 1 | 10
> 2 | 10
> 3 | 10
> 4 | 10
> 5 | 10
> (5 rows)
>
> mydb=# update foo set mynum = 20 where id < 100;
> UPDATE 5
> mydb=# select * from foo;
> id | mynum
> ----+-------
> 1 | 20
> 2 | 20
> 3 | 20
> 4 | 20
> 5 | 20
> (5 rows)
>
> mydb=# commit;
> COMMIT
>
> Now the result of all successfully committed serializiable
> transactions is consistent with the order tx2 -> tx3 -> tx1. All
> is good.
>
> Kevin Grittner
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andy Colson 2016-03-10 19:56:40 Re: Best approach for multi-database system
Previous Message Edson Richter 2016-03-10 19:41:52 Best approach for multi-database system

Browse pgsql-sql by date

  From Date Subject
Next Message Kevin Grittner 2016-03-10 19:58:52 Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)
Previous Message Kevin Grittner 2016-03-09 22:41:02 Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)