From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: Assertion failure on hot standby |
Date: | 2010-11-25 07:59:48 |
Message-ID: | AANLkTikGGKPFPBGSVxF2C2oPsMcopchZjUjLU129vrg8@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 24, 2010 at 1:27 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php
>
> When I did unusual operations (e.g., suspend bgwriter by gdb,
> pgbench -i and issue txid_current many times) on the master
> in order to try to reproduce the above HS error, I encountered
> the following assertion error.
>
> Since I compiled the standby postgres with WAL_DEBUG and
> ran it with wal_debug = on, all the replayed WAL records were
> logged.
>
> ------------------------
> sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid
> 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09
> sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid
> 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09
> sby LOG: consistent recovery state reached at 0/134C0510
> sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid
> 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09
> sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0;
> len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409
> sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0;
> len 20: Standby - running xacts: nextXid 23256 latestCompletedXid
> 23255 oldestRunningXid 23256
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1209)
> sby LOG: database system is ready to accept read only connections
> sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted
> sby LOG: terminating any other active server processes
> ------------------------
>
> Does anyone know what the cause of the problem is?
I was able to reproduce this problem. This happens because CHECKPOINT
can write the WAL record indicating that the transaction with XID = 0 has taken
the AccessExclusive lock. This WAL record causes that assertion failure in the
standby.
Here is the procedure to reproduce the problem:
-------------------
1. Execute "DROP TABLE" and suspend the execution before calling
RemoveRelations -> LockRelationOid -> LockAcquire ->
LockAcquireExtended -> LogAccessExclusiveLock
by, for example, using gdb.
2. While "DROP TABLE" is being suspended, execute CHECKPOINT.
This CHECKPOINT will generate the above-mentioned WAL record.
-------------------
To solve the problem, ISTM that XID should be assigned before the information
about AccessExclusive lock becomes visible to another process. Or CHECKPOINT
(i.e., GetRunningTransactionLocks) should ignore the locks with XID = 0.
Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Shigeru HANADA | 2010-11-25 08:12:44 | SQL/MED - file_fdw |
Previous Message | Itagaki Takahiro | 2010-11-25 07:52:51 | Re: format() with embedded to_char() formatter |