From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Antonin Houska <ah(at)cybertec(dot)at> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: Cleaning up orphaned files using undo logs |
Date: | 2020-11-19 03:22:44 |
Message-ID: | CAA4eK1L5bVTHkCQtZauAEk3CfSZy9Z9eUeBDCkJNkE3uEhJirQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 18, 2020 at 4:03 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > > >
> > > > >
> > > > > No background undo
> > > > > ------------------
> > > > >
> > > > > Reduced complexity of the patch seems to be the priority at the moment. Amit
> > > > > suggested that cleanup of an orphaned relation file is simple enough to be
> > > > > done on foreground and I agree.
> > > > >
> > > >
> > > > Yeah, I think we should try and see if we can make it work but I
> > > > noticed that there are few places like AbortOutOfAnyTransaction where
> > > > we have the assumption that undo will be executed in the background.
> > > > We need to deal with it.
> > >
> > > I think this is o.k. if we always check for unapplied undo during startup.
> > >
> >
> > Hmm, how it is ok to leave undo (and rely on startup) unless it is a
> > PANIC error. IIRC, this path is invoked in non-panic errors as well.
> > Basically, we won't be able to discard such an undo which doesn't seem
> > like a good idea.
>
> Since failure to apply leaves unconsistent data, I assume it should always
> cause PANIC, shouldn't it?
>
But how can we ensure that AbortOutOfAnyTransaction will be called
only in that scenario?
> (Thomas seems to assume the same in [1].)
>
> > > > > "undo worker" is still there, but it only processes undo requests after server
> > > > > restart because relation data can only be changed in a transaction - it seems
> > > > > cleaner to launch a background worker for this than to hack the startup
> > > > > process.
> > > > >
> > > >
> > > > But, I see there are still multiple undoworkers that are getting
> > > > launched and I am not sure if that works correctly because a
> > > > particular undoworker is connected to a database and then it starts
> > > > processing all the pending undo.
> > >
> > > Each undo worker applies only transactions for its own database, see
> > > ProcessExistingUndoRequests():
> > >
> > > /* We can only process undo of the database we are connected to. */
> > > if (xact_hdr.dboid != MyDatabaseId)
> > > continue;
> > >
> > > Nevertheless, as I've just mentioned in my response to Thomas, I admit that we
> > > should try to live w/o the undo worker altogether.
> > >
> >
> > Okay, but keep in mind that there could be a large amount of undo
> > (unlike redo which has some limit as we can replay it from the last
> > checkpoint) which needs to be processed but it might be okay to live
> > with that for now.
>
> Yes, the information to remove relation file does not take much space in the
> undo log.
>
> > Another thing is that it seems we need to connect to the database to perform
> > it which might appear a bit odd that we don't allow users to connect to the
> > database but internally we are connecting it.
>
> I think the implementation will need to follow the outcome of the part of the
> discussion that starts at [2], but I see your concern. I'm thinking why
> database connection is not needed to apply WAL but is needed for UNDO. I think
> locks make the difference.
>
Yeah, it would be probably a good idea to see if we can make undo
apply work without db-connection especially if we want to do before
allowing connections. The other possibility could be to let discard
worker do this work lazily after allowing connections.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-11-19 03:34:33 | Re: Move OpenSSL random under USE_OPENSSL_RANDOM |
Previous Message | Michael Paquier | 2020-11-19 03:13:44 | Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait) |