From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant tuple copy in tqueueReceiveSlot() |
Date: | 2020-09-18 10:46:59 |
Message-ID: | CAJ3gD9eiyq_vQOPFJ22WVeP1-DnciQ9g2nLu=QJx=t-WE9DJMA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 17 Sep 2020 at 08:55, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2020-09-17 14:20:50 +1200, Thomas Munro wrote:
> > I wonder if there is a way we could make "Minimal Tuples but with the
> > length travelling separately (and perhaps chopped off?)" into a
> > first-class concept... It's also a shame to be schlepping a bunch of
> > padding bytes around.
Yeah, I think we can pass a "length" data separately, but since the
receiver end already is assuming that it knows the received data is a
minimal tuple, I thought why not skip passing this redundant
component. But anyways, if you and Andres are suggesting that being
able to skip the copy is important for virtual tuples as well, then I
think the approach you suggested (supplying an allocated memory to the
tuple API for conversion) would be one of the better options with us,
if not the only good option. Maybe I will try looking into the shm_mq
working to see if we can come up with a good solution.
>
> There really is no justification for having MinimalTuples, as we have
> them today at least, anymore. We used to rely on being able to construct
> pointers to MinimalTuples that are mostly compatible with HeapTuple. But
> I think there's none of those left since PG 12.
Ah ok.
>
> I think it'd make a bit more sense to do some steps towards having a
> more suitable "minimal" tuple representation, rather than doing this
> local, pretty ugly, hacks. A good way would be to just starting to
> remove the padding, unnecessary fields etc from MinimalTuple.
So there are two things we wish to do :
1. Prevent an extra tuple forming step before sending minimal tuple
data. Possibly device an shm_mq API to get memory to write tuple of a
given length, and device something like
FormMinimalTupleDataInHere(memory_allocated_by_shm_mq) which will
write minimal tuple data.
2. Shrink the MinimalTupleData structure because it no longer needs
the current padding etc and we can substitute this new MinimalTuple
structure with the current one all over the code wherever it is
currently being used.
If we remove the unnecessary fields from the tuple data being sent to
Gather node, then we need to again form a MinimalTuple at the
receiving end, which again adds an extra tuple forming. So I
understand, that's the reason why you are saying we should shrink the
MinimalTupleData structure itself, in which case we will continue to
use the received new MinimalTupledata as an already-formed tuple, like
how we are doing now.
Now, the above two things (1. and 2.) look independent to me. Suppose
we first do 1. i.e. we come up with a good way to form an in-place
MinimalTuple at the sender's end, without any change to the
MinimalTupleData. And then when we do 2. i.e. shrink the
MinimalTupleData; but for that, we won't require any change in the
in-place-tuple-forming API we wrote in 1. . Just the existing
underlying function heap_form_minimal_tuple() or something similar
might need to be changed. At least that's what I feel right now.
>
> I also think that it'd be good to look at a few of the other places that
> are heavily bottlenecked by MinimalTuple overhead before designing new
> API around this. IIRC it's e.g. very easy to see hash joins spending a
> lot of time doing MinimalTuple copies & conversions.
Yeah, makes sense. The above FormMinimalTupleDataInHere() should be
able to be used for these other places as well. Will keep that in
mind.
>
> > > Thomas, I guess you had a different approach in mind when you said
> > > about "returning either success or
> > > hey-that-buffer's-too-small-I-need-N-bytes". But what looks clear to
> >
> > Yeah I tried some things like that, but I wasn't satisfied with any of
> > them; basically the extra work involved in negotiating the size was a
> > bit too high.
Hmm, ok. Let me see if there is any way around this.
>> On the other hand, because my interface was "please
> > write a MinimalTuple here!", it had the option to *form* a
> > MinimalTuple directly in place, whereas your approach can only avoid
> > creating and destroying a temporary tuple when the source is a heap
> > tuple.
True.
>
> There's a lot of cases where the source is a virtual slot (since we'll
> often project stuff below Gather). So it'd be quite advantageous to
> avoid building an unnecessary HeapTuple in those cases.
Yeah right.
From | Date | Subject | |
---|---|---|---|
Next Message | Juan José Santamaría Flecha | 2020-09-18 10:47:06 | Re: BUG #15858: could not stat file - over 4GB |
Previous Message | Ashutosh Bapat | 2020-09-18 10:33:47 | Re: Report error position in partition bound check |