From: | Antonin Houska <ah(at)cybertec(dot)at> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: POC: Cleaning up orphaned files using undo logs |
Date: | 2021-09-21 07:02:02 |
Message-ID: | 2005.1632207722@antos |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > > Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> >
> > > > * By throwing at the patchset `make installcheck` I'm getting from time to time
> > > > and error on the restart:
> > > >
> > > > TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
> > > > File: "undorecordset.c", Line: 1098, PID: 6055)
> > > >
> > > > From what I see XLogReadBufferForRedoExtended finds an invalid buffer and
> > > > returns BLK_NOTFOUND. The commentary says:
> > > >
> > > > If the block was not found, then it must be discarded later in
> > > > the WAL.
> > > >
> > > > and continues with skip = false, but fails to get a page from an invalid
> > > > buffer few lines later. It seems that the skip flag is supposed to be used
> > > > this situation, should it also guard the BufferGetPage part?
> > >
> > > I could see this sometime too, but can't reproduce it now. It's also not clear
> > > to me how XLogReadBufferForRedoExtended() can return BLK_NOTFOUND, as the
> > > whole undo log segment is created at once, even if only part of it is needed -
> > > see allocate_empty_undo_segment().
> >
> > I could eventually reproduce the problem. The root cause was that WAL records
> > were created even for temporary / unlogged undo, and thus only empty pages
> > could be found during replay. I've fixed that and also setup regular test for
> > the BLK_NOTFOUND value. That required a few more fixes to UndoReplay().
> >
> > Attached is a new version.
>
> Yep, makes sense, thanks. I have few more questions:
>
> * The use case with orphaned files is working somewhat differently after
> the rebase on the latest master, do you observe it as well? The
> difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
> an orphaned relation file immediately (only later on checkpoint)
> because of empty pendingUnlinks. I haven't investigated more yet, but
> seems like after this commit:
>
> commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
> Author: Thomas Munro <tmunro(at)postgresql(dot)org>
> Date: Mon Aug 2 17:32:20 2021 +1200
>
> Run checkpointer and bgwriter in crash recovery.
>
> Start up the checkpointer and bgwriter during crash recovery (except in
> --single mode), as we do for replication. This wasn't done back in
> commit cdd46c76 out of caution. Now it seems like a better idea to make
> the environment as similar as possible in both cases. There may also be
> some performance advantages.
>
> something has to be updated (pendingOps are empty right now, so no
> unlink request is remembered).
I haven't been debugging that part recently, but yes, this commit is relevant,
thanks for pointing that out! Attached is a patch that should fix it. I'll
include it in the next version of the patch series, unless you tell me that
something is still wrong.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Attachment | Content-Type | Size |
---|---|---|
sync.patch | text/x-diff | 583 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2021-09-21 08:07:55 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Pavel Stehule | 2021-09-21 06:50:31 | Re: proposal: possibility to read dumped table's name from file |