Re: POC: Cleaning up orphaned files using undo logs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-08-24 14:19:19
Message-ID: CA+Tgmob-fVTFJmsnH8KW34h_Qr6hj4G9m3W_wNX1k5oJo394Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 21, 2019 at 4:34 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> ReleaseResourcesAndProcessUndo() is only supposed to be called after
> AbortTransaction(), and the additional steps it performs --
> AtCleanup_Portals() and AtEOXact_Snapshot() or alternatively
> AtSubCleanup_Portals -- are taken from Cleanup(Sub)Transaction.
> That's not crazy; the other steps in Cleanup(Sub)Transaction() look
> like stuff that's intended to be performed when we're totally done
> with this TransactionState stack entry, whereas these things are
> slightly higher-level cleanups that might even block undo (e.g.
> undropped portal prevents orphaned file cleanup). Granted, there are
> no comments explaining why those particular cleanup steps are
> performed here, and it's possible some other approach is better, but I
> think perhaps it's not quite as flagrantly broken as you think.

Andres smacked me with the clue-bat off-list and now I understand why
this is broken: there's no guarantee that running the various
AtEOXact/AtCleanup functions actually puts the transaction back into a
good state. They *might* return the system to the state that it was
in immediately following StartTransaction(), but they also might not.
Moreover, ProcessUndoRequestForEachLogCat uses PG_TRY/PG_CATCH and
then discards the error without performing *any cleanup at all* but
then goes on and tries to do undo for other undo log categories
anyway. That is totally unsafe.

I think that there should only be one chance to perform undo actions,
and as I said or at least alluded to before, if that throws an error,
it shouldn't be caught by a TRY/CATCH block but should be handled by
the state machine in xact.c. If we're not going to make the state
machine handle these conditions, the addition of
TRANS_UNDO/TBLOCK_UNDO/TBLOCK_SUBUNDO is really missing the boat. I'm
still not quite sure of the exact sequence of steps: we clearly need
AtCleanup_Portals() and a bunch of the other stuff that happens during
CleanupTransaction(), ideally including the freeing of memory, to
happen before we try undo. But I don't have a great feeling for how to
make that happen, and it seems more desirable for undo to begin as
soon as the transaction fails rather than waiting until
Cleanup(Sub)Transaction() time. I think some more research is needed
here.

> I am also not convinced that semi-critical sections are a bad idea,

Regarding this, after further thought and discussion with Andres,
there are two cases here that call for somewhat different handling:
temporary undo, and subtransaction abort.

In the case of a subtransaction abort, we can't proceed with the
toplevel transaction unless we succeed in applying the
subtransaction's undo, but that does not require killing off the
backend. It might be a better idea to just fail the containing
subtransaction with the error that occurred during undo apply; if
there are multiple levels of subtransactions present then we might
fail in the same way several times, but eventually we'll fail at the
top level, forcibly kick the undo into the background, and the session
can continue. The background workers will, hopefully, eventually
recover the situation. Even if they can't, because, say, the failure
is due to a bug or whatever, killing off the session doesn't really
help.

In the case of temporary undo, killing the session is a much more
appealing approach. If we don't, how will that undo ever get
processed? We could retry at some point (like every time we return to
the toplevel command prompt?) or just ignore the fact that we didn't
manage to perform those undo actions and leave that undo there like an
albatross, accumulating more and more undo behind it until the session
exits or the disk fills up. The latter strikes me as a terrible user
experience, especially because for wire protocol reasons we'd have to
swallow the errors or at best convert them to warnings, but YMMV.

Anyway, probably these cases should not be handled exactly the same
way, but exactly what to do probably depends on the previous question:
how exactly does the integration into xact.c's state machine work,
anyway?

Meanwhile, I've been working up a prototype of how the undorequest.c
stuff I sent previously could be integrated with xact.c. In working
on that, I've realized that there seem to be two different tasks. One
is tracking the information that we'll need to have available to
perform undo actions. The other is the actual transaction state
manipulation: when and how do we abort transactions, cleanup
transactions, start new transactions specifically for undo? How are
transactions performing undo specially marked, if at all? The
attached patch includes a new module, undostate.c/h, which tries to
handle the first of those things; this is just a prototype, and is
missing some pieces marked with XXX, but I think it's probably the
right general direction. It will still need to be plugged into a
framework for launching undo apply background workers (which might
require some API adjustments) and it needs xact.c to handle the core
transactional stuff. But hopefully it will help to illustrate how the
undorequest.c stuff that I sent before can actually be put to use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v3-0001-New-undo-request-manager-now-with-some-xact.c-int.patch application/octet-stream 79.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-08-24 14:51:58 Re: "ago" times on buildfarm status page
Previous Message Emre Hasegeli 2019-08-24 12:23:11 Re: mingw32 floating point diff