Quick Links

Re: POC: Cleaning up orphaned files using undo logs

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: POC: Cleaning up orphaned files using undo logs
Date:	2019-08-21 07:54:53
Message-ID:	CAFiTN-smHReMT71VM=qTe8kz8+7-YtLvQJjE86EvfD_mwE6Y7w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Aug 20, 2019 at 7:57 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Aug 19, 2019 at 2:04 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > Currently, In UnpackedUndoRecord we store all members directly which
> > are set by the caller. We store pointers to some header which are
> > allocated internally by the undo layer and the caller need not worry
> > about setting them. So now you are suggesting to put other headers
> > also as structures in UnpackedUndoRecord. I as such don't have much
> > problem in doing that but I think initially Robert designed
> > UnpackedUndoRecord structure this way so it will be good if Robert
> > provides his opinion on this.
>
> I don't believe that's what is being suggested. It seems to me that
> the thing Andres is complaining about here has roots in the original
> sketch that I did for this code. The oldest version I can find is
> here:
>
> https://github.com/EnterpriseDB/zheap/commit/7d194824a18f0c5e85c92451beab4bc6f044254c
>
> In this version, and I think still in the current version, there is a
> two-stage marshaling strategy. First, the individual fields from the
> UnpackedUndoRecord get copied into global variables (yes, that was my
> fault, too, at least in part!) which are structures. Then, the
> structures get copied into the target buffer. The idea of that design
> was to keep the code simple, but it didn't really work out, because
> things got a lot more complicated between the time I wrote those 3244
> lines of code and the >3000 lines of code that live in this patch
> today. One thing that change was that we moved more and more in the
> direction of considering individual fields as separate objects to be
> separately included or excluded, whereas when I wrote that code I
> thought we were going to have groups of related fields that stood or
> fell together. That idea turned out to be wrong. (There is the
> even-larger question here of whether we ought to take Heikki's
> suggestion and make this whole thing a lot more generic, but let's
> start by discussing how the design that we have today could be
> better-implemented.)
>
> If I understand Andres correctly, he's arguing that we ought to get
> rid of the two-stage marshaling strategy. During decoding, he wants
> data to go directly from the buffer that contains it to the
> UnpackedUndoRecord without ever being stored in the UnpackUndoContext.
> During insertion, he wants data to go directly from the
> UnpackedUndoRecord to the buffer that contains it. Or, at least, if
> there has to be an intermediate format, he wants it to be just a chunk
> of raw bytes, rather than a bunch of individual fields like we have in
> UndoPackContext currently. I think that's a reasonable goal. I'm not
> as concerned about it as he is from a performance point of view, but I
> think it would make the code look nicer, and that would be good. If
> we save CPU cycles along the way, that is also good.
>
> In broad outline, what this means is:
>
> 1. Any field in the UndoPackContext that starts with urec_ goes away.
> 2. Instead of something like InsertUndoBytes((char *)
> &(ucontext->urec_fxid), ...) we'd write InsertUndobytes((char *)
> &uur->uur_fxid, ...).
> 3. Similarly instead of ReadUndoBytes((char *) &ucontext->urec_fxid,
> ...) we'd write ReadUndoBytes((char *) &uur->uur_fxid, ...).
> 4. It seems slightly trickier to handle the cases where we've got a
> structure instead of individual fields, like urec_hd. But those could
> be broken down into field-by-field reads and writes, e.g. in this case
> one call for urec_type and a second for urec_info.
> 5. For uur_group and uur_logswitch, the code would need to allocate
> those subsidiary structures before copying into them.
>
> To me, that seems like it ought to be a pretty straightforward change
> that just makes things simpler. We'd probably need to pass the
> UnpackedUndoRecord to BeginUnpackUndo instead of FinishUnpackUndo, and
> keep a pointer to it in the UnpackUndoContext, but that seems fine.
> FinishUnpackUndo would end up just about empty, maybe entirely empty.
>
> Is that a reasonable idea?
>
I have already attempted that part and I feel it is not making code
any simpler than what we have today. For packing, it's fine because I
can process all the member once and directly pack it into one memory
chunk and I can insert that to the buffer by one call of
InsertUndoBytes and that will make the code simpler.

But, while unpacking if I directly unpack to the UnpackUndoRecord then
there are few complexities. I am not saying those are difficult to
implement but code may not look better.

a) First, we need to add extra stages for unpacking as we need to do
field by field.

b) Some of the members like uur_payload and uur_tuple are not the same
type in the UnpackUndoRecord compared to how it is stored in the page.
In UnpackUndoRecord those are StringInfoData whereas on the page we
store it as UndoRecordPayload header followed by the actual data. I
am not saying we can not unpack this directly we can do it like,
first read the payload length from the page in uur_payload.len then
read tuple length in uur_tuple.len then read both the data. And, for
that, we will have to add extra stages.

c) Currently, in UnpackUndoContext the members are stored in the same
order in which we are storing them to the page whereas in
UnpackUndoRecord they are stored in the order such that they are more
convenient for them to understand, like all the fields which are set
by the caller are separate from the fields which are allocated
internally by the undo layer (transaction header and the log switch
header). Now, for directly unpacking to the UnpackUndoRecord, we need
to read them out of order which will make code more unreadable.

Another option could be that we unpack some part directly into the
UnapackUndoRecord (individual fields) and other parts to
UnpackUndoContext (structures, payload) and in Finalise only copy
those parts from UnpackUndoContext to UnapackUndoRecord. The code
might look bit confusing though.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: POC: Cleaning up orphaned files using undo logs at 2019-08-20 14:27:44 from Robert Haas

Responses

Re: POC: Cleaning up orphaned files using undo logs at 2019-08-21 15:34:16 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Konstantin Knizhnik	2019-08-21 08:54:29	Re: Global temporary tables
Previous Message	Magnus Hagander	2019-08-21 07:41:43	Re: "ago" times on buildfarm status page