Re: [HACKERS] Unlogged tables cleanup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unlogged tables cleanup
Date: 2019-05-13 17:09:17
Message-ID: CA+TgmoarmtbAPFj=tCT4Tm4LqQnwHxJRFCSG2Bm=m6cE-nc=fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2019 at 12:50 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > so it would be sufficient to have the init fork be recovered from WAL
> > for that to work. However, we also do ResetUnloggedRelations *before*
> > replaying WAL in order to remove leftover not-init-fork files, and that
> > process requires that the init fork is present at that time.
>
> What scenario are you precisely wondering about? That
> ResetUnloggedRelations() could overwrite the main fork, while not yet
> having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> that only be possible while still in an inconsistent state? A checkpoint
> would have serialized the correct contents, and we'd not reach HS
> consistency before having replayed that WAL records resetting the table
> and the init fork consistency?

I think I see what Alvaro is talking about, or at least I think I see
*a* possible problem based on his remarks.

Suppose we create an unlogged table and then crash. The main fork
makes it to disk, and the init fork does not. Before WAL replay, we
remove any main forks that have init forks, but because the init fork
was lost, that does not happen. Recovery recreates the init fork.
After WAL replay, we try to copy_file() each _init fork to the
corresponding main fork. That fails, because copy_file() expects to be
able to create the target file, and here it can't do that because it
already exists.

If that's the scenario, I'm not sure the smgrimmedsync() call is
sufficient. Suppose we log_smgrcreate() but then crash before
smgrimmedsync()... seems like we'd need to do them in the other order,
or else maybe just pass a flag to copy_file() telling it not to be so
picky.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2019-05-13 17:19:19 Re: PostgreSQL 12: Feature Highlights
Previous Message Alvaro Herrera 2019-05-13 17:07:30 Re: [HACKERS] Unlogged tables cleanup