Re: Failed to delete old ReorderBuffer spilled files

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failed to delete old ReorderBuffer spilled files
Date: 2017-11-22 01:10:27
Message-ID: CAD21AoDkPbCNX-d_VqKrW4rDt5W5Y3=LQr7zYbbxF=uVDayt-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 22, 2017 at 9:03 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Tue, 21 Nov 2017 20:53:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171121(dot)205304(dot)90315453(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> At Tue, 21 Nov 2017 20:27:25 +0900, atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote in <d5dddea9-4182-a7e4-f368-156f5470d244(at)lab(dot)ntt(dot)co(dot)jp>
>> > Thanks for reviewing!
>> >
>> >
>> > On 2017/11/21 18:12, Masahiko Sawada wrote:
>> > > On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada
>> > > <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> > >> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
>> > >> <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > >>> Hi,
>> > >>>
>> > >>> I put many queries into one transaction and made ReorderBuffer spill
>> > >>> data to disk, and sent SIGKILL to postgres before the end of the
>> > >>> transaction.
>> > >>>
>> > >>> After starting up postgres again, I observed the files spilled to
>> > >>> data wasn't deleted.
>> > >>>
>> > >>> I think these files should be deleted because its transaction was no
>> > >>> more valid, so no one can use these files.
>> > >>>
>> > >>>
>> > >>> Below is a reproduction instructions.
>> > >>>
>> > >>> ------------------------------------------------
>> > >>> 1. Create table and publication at publiser.
>> > >>>
>> > >>> @pub =# CREATE TABLE t1(
>> > >>> id INT PRIMARY KEY,
>> > >>> name TEXT);
>> > >>>
>> > >>> @pub =# CREATE PUBLICATION pub FOR TABLE t1;
>> > >>>
>> > >>> 2. Create table and subscription at subscriber.
>> > >>>
>> > >>> @sub =# CREATE TABLE t1(
>> > >>> id INT PRIMARY KEY,
>> > >>> name TEXT
>> > >>> );
>> > >>>
>> > >>> @sub =# CREATE SUBSCRIPTION sub
>> > >>> CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
>> > >>> PUBLICATION pub;
>> > >>>
>> > >>> 3. Put many queries into one transaction.
>> > >>>
>> > >>> @pub =# BEGIN;
>> > >>> INSERT INTO t1
>> > >>> SELECT
>> > >>> i,
>> > >>> 'aaaaaaaaaa'
>> > >>> FROM
>> > >>> generate_series(1, 1000000) as i;
>> > >>>
>> > >>> 4. Then we can see spilled files.
>> > >>>
>> > >>> @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
>> > >>> state
>> > >>> xid-561-lsn-0-1000000.snap
>> > >>> xid-561-lsn-0-2000000.snap
>> > >>> xid-561-lsn-0-3000000.snap
>> > >>> xid-561-lsn-0-4000000.snap
>> > >>> xid-561-lsn-0-5000000.snap
>> > >>> xid-561-lsn-0-6000000.snap
>> > >>> xid-561-lsn-0-7000000.snap
>> > >>> xid-561-lsn-0-8000000.snap
>> > >>> xid-561-lsn-0-9000000.snap
>> > >>>
>> > >>> 5. Kill publisher's postgres process before COMMIT.
>> > >>>
>> > >>> @pub $ kill -s SIGKILL [pid of postgres]
>> > >>>
>> > >>> 6. Start publisher's postgres process.
>> > >>>
>> > >>> @pub $ pg_ctl start -D ${PGDATA}
>> > >>>
>> > >>> 7. After a while, we can see the files remaining.
>> > >>> (Immediately after starting publiser, we can not see these files.)
>> > >>>
>> > >>> @pub $ pg_ctl start -D ${PGDATA}
>> > >>>
>> > >>> When I configured with '--enable-cassert', below assertion error
>> > >>> was appeared.
>> > >>>
>> > >>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:
>> > >>> "reorderbuffer.c",
>> > >>> Line: 2576)
>> > >>> ------------------------------------------------
>> > >>>
>> > >>> Attached patch sets final_lsn to the last ReorderBufferChange if
>> > >>> final_lsn == 0.
>> > >>
>> > >> Thank you for the report. I could reproduce this issue with the above
>> > >> step. My analysis is, the cause of that a serialized reorder buffer
>> > >> isn't cleaned up is that the aborted transaction without an abort WAL
>> > >> record has no chance to set ReorderBufferTXN->final_lsn. So if there
>> > >> is such serialized transactions ReorderBufferRestoreCleanup cleanups
>> > >> no files, which is cause of the assertion failure (or a file being
>> > >> orphaned). What do you think?
>> > >>
>> > >> On detail of your patch, I'm not sure it's safe if we set the lsn of
>> > >> other than commit record or abort record to final_lsn. The comment in
>> > >> reorderbuffer.h says,
>> > >>
>> > >> typedef trcut ReorderBufferTXN
>> > >> {
>> > >> (snip)
>> > >>
>> > >> /* ----
>> > >> * LSN of the record that lead to this xact to be committed or
>> > >> * aborted. This can be a
>> > >> * * plain commit record
>> > >> * * plain commit record, of a parent transaction
>> > >> * * prepared transaction commit
>> > >> * * plain abort record
>> > >> * * prepared transaction abort
>> > >> * * error during decoding
>> > >> * ----
>> > >> */
>> > >> XLogRecPtr final_lsn;
>> > >>
>> > >> But with your patch, we could set a lsn of a record that is other than
>> > >> what listed above to final_lsn. One way I came up with is to make
>> >
>> > I added some comments on final_lsn.
>> >
>> > >> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
>> > >> regards it as a aborted transaction that doesn't has a abort WAL
>> > >> record. So we can cleanup all serialized files if final_lsn of a
>> > >> transaction is invalid.
>> > >
>> > > After more thought, since there are some codes cosmetically setting
>> > > final_lsn when the fate of transaction is determined possibly we
>> > > should not accept a invalid value of final_lsn even in the case.
>> > >
>>
>> It doesn't seem just a cosmetic. The first/last_lsn are used to
>> determin the files to be deleted. On the other hand the TXN
>> cannot have last_lsn since it hasn't see abort/commit record.
>>
>> > My new patch keeps setting final_lsn, but changed its location to the
>> > top of ReorderBufferCleanupTXN().
>> > I think it's a kind of preparation, so doing it at the top improves
>> > readability.
>> >
>> > > Anyway I think you should register this patch to the next commit fest
>> > > so as not forget.
>> >
>> > Thanks for you advice, I've registered this issue as a bug.
>>
>> Using last changing LSN might work but I'm afraid that that fails
>> to remove the last snap file if the crash happens at the very
>> start of a segment.

I think it works even in the case because the we can compute the
largest WAL segment number that we need to remove by using the lsn of
the last change in old transaction. Am I missing something?

>>
>> Anyway all files of the transaction is no longer useless at the
>> time, but it seems that the last_lsn is required to avoid
>> directory scanning at every transaction end.
>>
>> Letting ReorderBufferAbortOld scan the directory and determine
>> the first and last LSN then set to the txn would work but it
>> might be an overkill. Using the beginning LSN of the next segment
>> of the last_change->lsn could surely work... really?
>> (ReorderBufferRestoreCleanup doesn't complain on ENOENT.)
>
> Somehow I deleted exessively while editing. One more possible
> solution is making ReorderBufferAbortOld take final LSN and
> DecodeStandbyOp passes the LSN of XLOG_RUNNING_XACTS record to
> it.
>

Setting final_lsn in ReorderBufferAbortOld seems good to me but I'm
not sure we can use the lsn of XLOG_RUNNING_XACTS record. Doesn't
ReorderBufferRestoreCleanup() raise an error due to ENOENT if the wal
segment having XLOG_RUNNING_XACTS records doesn't have any changes of
the old transaction?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-11-22 01:11:17 Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Thomas Munro 2017-11-22 00:58:03 Re: [HACKERS] CLUSTER command progress monitor