Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)
Date: 2017-12-28 06:30:04
Message-ID: 20171228063004.GB6181@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 27, 2017 at 04:53:42PM +0900, Michael Paquier wrote:
> Indeed, this bit is important, and RangeVarGet does so. Looking at
> get_object_address(), it also handles locking of the object. Using that
> as interface to address all the concurrency problems could be appealing,
> and less invasive for a final result as many DDLs are visibly
> impacted (still need to make a list here). What do you think?

So, I have looked at all the object types to spot concurrency problems,
and I have found some more. And the winners are:
- database
- role
- domain
- event trigger
- FDW
- server
- user mapping
- type
- tablespace

Most of the failures show "tuple concurrently updated" using a given set
of ALTER commands running concurrently, but I have been able to trigger
as well one "cache lookup failure" for domains, and a duplicate key value
violation for databases.

I have done some extra checks with parallel ALTER commands for the
following objects as well, without spotting issues:
- access method
- cast
- attribute
- extension
- view
- policy
- publication
- subscription
- rule
- transform
- text search stuff
- statistics
- operator [family], etc.

As looking for those failures manually is no fun, I am attaching a patch
which includes a set of isolation regression tests able to trigger
them. You just need to look for unwelcome ERROR messages and you are
good to go. The goal to move forward will be to take care of all those
failures. Please note that isolation tests don't allow dynamic input, so
those have no tests but you can reproduce an error easily with ALTER
TABLESPACE SET SCHEMA between two sessions. Note as well that the domain
test will fail because the cache lookup error on domains exposes the
domain OID, but that's nothing to care about.

Opinions or thoughts?
--
Michael

Attachment Content-Type Size
concurrent-ddl-specs.patch text/plain 66.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-12-28 07:19:00 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
Previous Message Asif Naeem 2017-12-28 05:51:46 Re: How to Works with Centos