Re: ALTER TABLE deadlock with concurrent INSERT

From: Noah Misch <noah(at)leadboat(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE deadlock with concurrent INSERT
Date: 2011-03-04 07:36:42
Message-ID: 20110304073642.GA953@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 02, 2011 at 12:25:16PM -0800, Joe Conway wrote:
> I'm working with a client on an application upgrade script which
> executes a function to conditionally do an:
>
> ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz
>
> If this is run while the application is concurrently doing inserts into
> foo, we are occasionally seeing deadlocks. Aside from the fact that they
> are better off not altering the table amid concurrent inserts, I'm
> trying to understand why this is even able to happen. I expect one to
> block the other, not a deadlock.
>
> This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
> to 8.4.latest).
>
> We have not been able to repeat this forcibly. Here is what the log shows:
> ------------------------------
> 2011-02-25 14:38:07 PST [31686]: [1-1] ERROR: deadlock detected
> 2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL: Process 31686 waits for
> AccessExclusiveLock on relation 16896 of database 16386; blocked by
> process 31634.
> Process 31634 waits for RowExclusiveLock on relation 16902 of
> database 16386; blocked by process 31686.
> Process 31686: SELECT change_column_type('attribute_summary',
> 'sequence_number', 'numeric');
> Process 31634: insert into attribute_summary (attribute_value,
> sequence_number, attribute_id) values ($1, $2, $3)
> 2011-02-25 14:38:07 PST [31686]: [3-1] HINT: See server log for query
> details.
> 2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT: SQL statement "ALTER
> TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"
> PL/pgSQL function "change_column_type" line 18 at EXECUTE statement
> 2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT: SELECT
> change_column_type('attribute_summary', 'sequence_number', 'numeric');
> ------------------------------

Does relation 16902 (attribute_summary) have a foreign key constraint over the
sequence_number column, in either direction, with relation 16896? That would
explain it:

session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
session 2: SELECT 1 FROM rel16896 LIMIT 0;
session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Off the cuff, I think you could make sure this never deadlocks with a PL/pgSQL
recipe like this:

LOOP
BEGIN
LOCK TABLE rel16896;
LOCK TABLE attribute_summary NOWAIT;
EXIT;
EXCEPTION WHEN lock_not_available THEN
END;
END LOOP;

Granted, the cure may be worse than the disease.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-03-04 07:42:11 Re: Sync Rep v19
Previous Message Magnus Hagander 2011-03-04 07:03:14 Re: why is max standby delay only 35 minutes?