Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-08-09 08:29:14
Message-ID: CAA4eK1J55LU=F4MY4rpx5yOZD7YxWp1buf-+uGUfToEejE1crg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2019 at 8:39 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Mon, 22 Jul 2019 at 14:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I have started review of
> 0014-Allow-execution-and-discard-of-undo-by-background-wo.patch. Below
> are some quick comments to start with:
>
> +++ b/src/backend/access/undo/undoworker.c
>
> +#include "access/xact.h"
> +#include "access/undorequest.h"
> Order is not alphabetical
>

Fixed this and a few others.

> + * Each undo worker then start reading from one of the queue the requests for
> start=>starts
> queue=>queues
>
> -------------
>
> + rc = WaitLatch(MyLatch,
> + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> + 10L, WAIT_EVENT_BGWORKER_STARTUP);
> +
> + /* emergency bailout if postmaster has died */
> + if (rc & WL_POSTMASTER_DEATH)
> + proc_exit(1);
> I think now, thanks to commit cfdf4dc4fc9635a, you don't have to
> explicitly handle postmaster death; instead you can use
> WL_EXIT_ON_PM_DEATH. Please check at all such places where this is
> done in this patch.
>
> -------------
>

Fixed both of the above issues.

> +UndoWorkerGetSlotInfo(int slot, UndoRequestInfo *urinfo)
> +{
> + /* Block concurrent access. */
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
> +
> + MyUndoWorker = &UndoApplyCtx->workers[slot];
> Not sure why MyUndoWorker is used here. Can't we use a local variable
> ? Or do we intentionally attach to the slot as a side-operation ?
>
> -------------
>

I have changed the code around this such that we first attach to the
slot and then get the required info. Also, I don't see the need of
exclusive lock here, so changed it to shared lock.

> + * Get the dbid where the wroker should connect to and get the worker
> wroker=>worker
>
> -------------
>
> + BackgroundWorkerInitializeConnectionByOid(urinfo.dbid, 0, 0);
> 0, 0 => InvalidOid, 0
>
> + * Set the undo worker request queue from which the undo worker start
> + * looking for a work.
> start => should start
> a work => work
>
> --------------
>

Fixed both of these.

> + if (!InsertRequestIntoErrorUndoQueue(urinfo))
> I was thinking what happens if for some reason
> InsertRequestIntoErrorUndoQueue() itself errors out. In that case, the
> entry will not be marked invalid, and so there will be no undo action
> carried out because I think the undo worker will exit. What happens
> next with this entry ?

I think this will change after integration with Robert's latest patch
on queues, so will address along with it if required.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-08-09 08:34:05 default_table_access_method is not in sample config file
Previous Message Kyotaro Horiguchi 2019-08-09 08:23:16 Re: Small const correctness patch