High rate of transaction failure with the Serializable Isolation Level

From: Reza Taheri <rtaheri(at)vmware(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: High rate of transaction failure with the Serializable Isolation Level
Date: 2014-07-24 01:18:05
Message-ID: ded604822b28427a80b0751a973e4258@EX13-MBX-013.vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello PGSQL performance community,
[By way of introduction, we are a TPC subcommittee that is developing a benchmark with cloud-like characteristics for virtualized databases. The end-to-end benchmarking kit will be publicly available, and will run on PGSQL]

I am running into very high failure rates when I run with the Serializable Isolation Level. I have simplified our configuration to a single database with a constant workload, a TPC-E workload if you will, to focus on this this problem. We are running with PGSQL 9.2.4, ODBC 2.2.14 (as well as 2.3.3pre, which didn't help), RHEL 6.4, and a 6-way VM with 96GB of memory on a 4-socket Westmere server.

With our 9 transactions running with a mix of SQL_TXN_READ_COMMITTED and SQL_TXN_REPEATABLE_READ, we get less than 1% deadlocks, all of which occur because each row in one table, BROKER, may be read or written by multiple transactions at the same time. So, there are legitimate conflicts, which we deal with using an exponential backoff algorithm that sleeps for 10ms/30ms/90ms/etc.

When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an error, and have to be rolled back and retired one or more times. The total failure count (due to many transactions failing more than once) is 31,388.

What is unusual is that the majority of the failures occur in a statement that should not have any isolation conflicts. About 17K of failures are from the statement below:
2014-07-23 11:27:15 PDT 26085 ERROR: could not serialize access due to read/write dependencies among transactions
2014-07-23 11:27:15 PDT 26085 DETAIL: Reason code: Canceled on identification as a pivot, during write.
2014-07-23 11:27:15 PDT 26085 HINT: The transaction might succeed if retried.
2014-07-23 11:27:15 PDT 26085 CONTEXT: SQL statement "update TRADE
set T_COMM = comm_amount,
T_DTS = trade_dts,
T_ST_ID = st_completed_id,
T_TRADE_PRICE = trade_price
where T_ID = trade_id"
PL/pgSQL function traderesultframe5(ident_t,value_t,character,timestamp without time zone,trade_t,s_price_t) line 15 at SQL statement

This doesn't make sense since at any given time, only one transaction might possibly be accessing the row that is being updated. There should be no conflicts if we have row-level locking/isolation

The second most common conflict happens 7.6K times in the statement below:
2014-07-23 11:27:23 PDT 26039 ERROR: could not serialize access due to read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26039 DETAIL: Reason code: Canceled on identification as a pivot, during conflict in checking.
2014-07-23 11:27:23 PDT 26039 HINT: The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26039 CONTEXT: SQL statement "insert
into SETTLEMENT ( SE_T_ID,
SE_CASH_TYPE,
SE_CASH_DUE_DATE,
SE_AMT)
values ( trade_id,
cash_type,
due_date,
se_amount
)"
PL/pgSQL function traderesultframe6(ident_t,timestamp without time zone,character varying,value_t,timestamp without time zone,trade_t,smallint,s_qty_t,character) line 23 at SQL statement

I don't understand why an insert would hit a serialization conflict

We also have 4.5K conflicts when we try to commit:
2014-07-23 11:27:23 PDT 26037 ERROR: could not serialize access due to read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26037 DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
2014-07-23 11:27:23 PDT 26037 HINT: The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26037 STATEMENT: COMMIT

Does PGSQL raise locks to page level when we run with SQL_TXN_SERIALIZABLE? Are there any knobs I can play with to alleviate this? FWIW, the same transactions on MS SQL Server see almost no conflicts.

Thanks,
Reza

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rural Hunter 2014-07-24 01:30:12 Re: Very slow planning performance on partition table
Previous Message Felipe Santos 2014-07-23 20:19:13 Re: Building multiple indexes on one table.