Re: [HACKERS] Unlogged tables cleanup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unlogged tables cleanup
Date: 2019-05-23 13:14:59
Message-ID: CA+TgmoY7oBoeuF5UaLRpx2SgcGVs4iB0UJcTqjkpoQ2S5sx9ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 23, 2019 at 2:43 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, May 21, 2019 at 08:39:18AM -0400, Robert Haas wrote:
> > Yes. I thought I had described it. You create an unlogged table,
> > with an index of a type that does not smgrimmedsync(), your
> > transaction commits, and then the system crashes, losing the _init
> > fork for the index.
>
> The init forks won't magically go away, except in one case for empty
> routines not going through shared buffers.

No magic is required. If you haven't called fsync(), the file might
not be there after a system crash.

Going through shared_buffers guarantees that the file will be
fsync()'d before the next checkpoint, but I'm talking about a scenario
where you crash before the next checkpoint.

> Then, empty routines going through shared buffers fill in one or more
> buffers, mark it/them as empty, dirty it/them, log the page(s) and then
> unlock the buffer(s). If a crash happens after the transaction
> commits, so we would still have the init page in WAL, and at the end
> of recovery we would know about it.

Yeah, but the problem is that the currently system requires us to know
about it at the *beginning* of recovery. See my earlier remarks:

Suppose we create an unlogged table and then crash. The main fork
makes it to disk, and the init fork does not. Before WAL replay, we
remove any main forks that have init forks, but because the init fork
was lost, that does not happen. Recovery recreates the init fork.
After WAL replay, we try to copy_file() each _init fork to the
corresponding main fork. That fails, because copy_file() expects to be
able to create the target file, and here it can't do that because it
already exists.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-05-23 13:16:34 Re: Minimal logical decoding on standbys
Previous Message Peter Eisentraut 2019-05-23 13:13:00 Re: Fuzzy thinking in is_publishable_class