Quick Links

Re: POC: Cleaning up orphaned files using undo logs

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: POC: Cleaning up orphaned files using undo logs
Date:	2019-07-16 11:13:05
Message-ID:	CAA4eK1K9Sd3eRuckAOpg7hqp0r4+NQqFuq23LBz1oJaMeX=P1g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jul 16, 2019 at 2:09 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jul 1, 2019 at 3:54 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Reviewing Amit's 0016:
>
> performUndoActions appears to be badly-designed. For starters, it's
> sometimes wrong: the only place it gets set to true is in
> UndoActionsRequired (which is badly named, because from the name you
> expect it to return a Boolean and to not have side effects, but
> instead it doesn't return anything and does have side effects).
> UndoActionsRequired() only gets called from selected places, like
> AbortCurrentTransaction(), so the rest of the time it just returns a
> wrong answer. Now maybe it's never called at those times, but there's
> no guard to prevent a function like CanPerformUndoActions() (which is
> also badly named, because performUndoActions tells you whether you
> need to perform undo actions, not whether it's possible to perform
> undo actions) from being called before the flag is set. I think that
> this flag should be either (1) maintained eagerly - so that wherever
> we set start_urec_ptr we also set the flag right away or (2) removed -
> so when we need to know, we just loop over all of the undo categories
> on the spot, which is not that expensive because there aren't that
> many of them.
>

I would prefer to go with (2). So, I will change the function
CanPerformUndoActions() to loop over categories and return whether
there is a need to perform undo actions. Also, rename
CanPerformUndoActions as NeedToPerformUndoActions or
UndoActionsRequired, any other better suggestion?

> It seems pointless to make PrepareTransaction() take undo pointers as
> arguments, because those pointers are just extracted from the
> transaction state, to which PrepareTransaction() has a pointer.
>

Agreed, will remove.

> Thomas has already objected to another proposal to add functions that
> turn 32-bit XIDs into 64-bit XIDs. Therefore, I feel confident in
> predicting that he will likewise object to GetEpochForXid. I think
> this needs to be changed somehow, maybe by doing what the XXX comment
> you added suggests.
>

We can do what the comment says, but there is one more similar usage
in undodiscard.c as well, so not sure if that is the right thing. I
think Thomas is suggesting to open code its usage where it is safe to
do so and required. I have responded to his email, let us see what he
has to say, based on that we can modify this patch.

> This patch has some problems with naming consistency. There's a
> function called PushUndoRequest() which calls a function called
> RegisterRollbackReq() to do the heart of the work. So, is it undo or
> rollback? Are we pushing or registering? Is it a request or a req?
>

I think we can rename PushUndoRequest as RegisterUndoRequest and
RegisterRollbackReq as RegisterUndoRequestGuts.

> For bonus points, the flag that the function sets is called
> undo_req_pushed, which is halfway in between the two competing
> terminologies. Other gripes about PushUndoRequest: push is vague and
> doesn't really explain what's happening, "apllying" is a typo,
> per_level is a poor variable name and shouldn't be declared volatile.
> This function has problems with naming in other places, too; please go
> through all of the names carefully and make them consistent and
> adequately descriptive.
>

Okay, will change as per suggestion.

> I am not a fan of applying_subxact_undo. I think we should look for a
> better design there. A couple of things occur to me. One is that we
> don't necessarily need to go to FATAL; we could just force the current
> transaction and all of its subtransactions fail all the way out to the
> top level, but then perhaps allow new transactions to be started
> afterwards. I'm not sure that's worth it, but it would work, and I
> think it has precedent in SxactIsDoomed. Assuming we're going to stick
> with the current FATAL plan, I think we should do something like
> invent a new kind of critical section that forces ERROR to be promoted
> to FATAL and then use it here. We could call it a semi-critical or
> locally-critical section, and the undo machinery could use it, but
> then also so could other things. I've wanted that sort of concept
> before, so I think it's a good idea to try to have something general
> and independent of undo. The same concept could be used in
> PerformUndoActions() instead of having to invent
> pg_rethrow_as_fatal(), so we'd have two uses for this mechanism right
> away.
>

Okay, I will investigate on the lines of the semi-critical section.

> FinishPreparedTransactions() tries to apply undo actions while
> interrupts are still held. Is that necessary?
>

I don't think so. I'll think some more and update back if I see any
problem, otherwise, will do RESUME_INTERRUPTS before performing
actions.

> Can we avoid it?
>
> It seems highly likely that the logic added to the TBLOCK_SUBCOMMIT
> case inside CommitTransactionCommand and also into
> ReleaseCurrentSubTransaction should have been added to
> CommitSubTransaction instead. If that's not true, then we have to
> believe that the TBLOCK_SUBRELEASE call to CommitSubTransaction needs
> different treatment from the other two cases, which sounds unlikely;
> we also have to explain why undo is somehow different from all of
> these other releases that are already handled in that function, not in
> its callers.
>

Yeah, it is better to move that code from ReleaseSavepoint to here or
rather move it to CommitSubTransaction as suggested by you.

> I also strongly suspect it is altogether wrong to do
> this before CommitSubTransaction sets s->state to TRANS_COMMIT; what
> if a subxact callback throws an error?
>

Are you worried that it might lead to the execution of actions twice?
If so, I think we prevent that during replay of actions and also that
can happen in other ways too. I am not telling that we should not
move that code block to the location you are suggesting, but I think
the current code is also not wrong.

> For related reasons, I don't think that the change ReleaseSavepoint()
> are right either. Notice the header comment: "As above, we don't
> actually do anything here except change blockState." The "as above"
> part of the comment probably didn't originally refer to
> DefineSavepoint(), which definitely does do other stuff, but to
> something like EndImplicitTransactionBlock() or EndTransactionBlock(),
> and DefineSavepoint() got stuck in the middle later. Anyway, your
> patch makes the comment false by doing actual state changes in this
> function, rather than just marking the subtransactions for commit.
> But why should that be right? If none of the many other bits of state
> are manipulated here rather than in CommitSubTransaction(), why is
> undo the one thing that is different? I guess this is basically just
> compensation for the lack of any of this code in the TBLOCK_SUBRELEASE
> path which I noted in the previous paragraph, but I still think the
> right answer is to put it all in CommitSubTransaction() *after* we set
> TRANS_COMMIT.
>

Agreed, will change accordingly.

> There are a number of things I either don't like or don't understand
> about PerformUndoActions. One is that undo_req_pushed gets passed to
> this function. That just looks really odd from an abstraction point
> of view. Basically, we have a function whose job is to "perform undo
> actions," and it gets a flag as an argument that tells it to not
> actually perform some of the undo actions: that's odd. I think the
> reason it's like that is because of the issue we've been discussing
> elsewhere that there's a separate undo request for each category.
>

The reason was that if we don't have that check here, then we need to
do the same in both the callers. As there are just two places, so
moving it to the caller should be okay. I think if we do that then
probably looping for each persistence level can also be moved into the
caller.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: POC: Cleaning up orphaned files using undo logs at 2019-07-15 20:39:19 from Robert Haas

Responses

Re: POC: Cleaning up orphaned files using undo logs at 2019-07-16 14:02:08 from Amit Kapila
Re: POC: Cleaning up orphaned files using undo logs at 2019-07-16 16:22:11 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sergei Kornilov	2019-07-16 11:19:43	Re: Change ereport level for QueuePartitionConstraintValidation
Previous Message	Thomas Munro	2019-07-16 10:42:06	Re: SegFault on 9.6.14