Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: grant(at)amazon(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
Date: 2015-04-23 01:20:49
Message-ID: CAB7nPqT5y9892TzLRzZ+CxvG6FwA206SoXcSbsvP6b-i=n3AUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 23, 2015 at 7:09 AM, <grant(at)amazon(dot)com> wrote:
> 1. Set max_prepared_transactions to 2 in postgresql.conf.
> 2. start Postgres.
> 3. Create two uncommitted prepared transactions: BEGIN; PREPARE
> TRANSACTION 'test1'; BEGIN; PREPARE TRANSACTION 'test2';
> 4. Set max_prepared_transactions to 1 in postgresql.conf.
> 5. Restart Postgres again.
>
> At this point the startup will fail with a fatal but the postmaster process
> keeps running.
>
> LOG: database system was interrupted; last known up at 2015-04-16 17:19:56
> PDT
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: record with zero length at 0/1826C70
> LOG: redo is not required
> LOG: recovering prepared transaction 1685
> LOG: recovering prepared transaction 1683
> FATAL: maximum number of prepared transactions reached
> HINT: Increase max_prepared_transactions (currently 1).
>
> Looks like their may be a LWLock that is not correctly getting released
> which causes the process to hang rather then exit.

Yep, the startup process remains stuck here, so we should release the
lock before issuing ERROR in twophase.c:
* frame #0: 0x00007fff95004b72 libsystem_kernel.dylib`semop + 10
frame #1: 0x000000010f59530f
postgres`PGSemaphoreLock(sema=0x00000001187a1190) + 63 at
pg_sema.c:387
frame #2: 0x000000010f632af9
postgres`LWLockAcquireCommon(lock=0x000000010fc452e0,
mode=LW_EXCLUSIVE, valptr=0x0000000000000000, val=0) + 377 at
lwlock.c:1037
frame #3: 0x000000010f63296b
postgres`LWLockAcquire(l=0x000000010fc452e0, mode=LW_EXCLUSIVE) + 43
at lwlock.c:900
frame #4: 0x000000010f2f975d postgres`AtAbort_Twophase + 93 at
twophase.c:284
frame #5: 0x000000010f2f9f24 postgres`AtProcExit_Twophase(code=1,
arg=0) + 20 at twophase.c:246

The patch attached fixes the problem for me, and I think that this
should be backpatched as well.
(Note to self: check the other calls of LWLockAcquire/Release to see
if there are other code paths in the same situation).
Regards,
--
Michael

Attachment Content-Type Size
20150423_twophase_lwlock_fix.patch text/x-diff 1.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Davis 2015-04-23 01:40:33 Re: [BUGS] Failure to coerce unknown type to specific type
Previous Message grant 2015-04-22 22:09:40 BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.