Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-14 13:41:57
Message-ID: CALDaNm31pGG+L9N4HbM0mO4iuceih4mJ5s87jEwOPaFLpmDKyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> > Attached v6 patch with the fixes.
> >
>
> Hi Vignesh,
>
> I noticed a couple of issues when scanning the code in the following
patch:
>
> v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
>
> In the following code, it will put a junk uint16 value into *destptr
> (and thus may well cause a crash) on a Big Endian architecture
> (Solaris Sparc, s390x, etc.):
> You're storing a (uint16) string length in a uint32 and then pulling
> out the lower two bytes of the uint32 and copying them into the
> location pointed to by destptr.
>
>
> static void
> +CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
> + uint32 *copiedsize)
> +{
> + uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
> +
> + memcpy(destptr, (uint16 *) &len, sizeof(uint16));
> + *copiedsize += sizeof(uint16);
> + if (len)
> + {
> + memcpy(destptr + sizeof(uint16), srcPtr, len);
> + *copiedsize += len;
> + }
> +}
>
> I suggest you change the code to:
>
> uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
> memcpy(destptr, &len, sizeof(uint16));
>
> [I assume string length here can't ever exceed (65535 - 1), right?]
>
> Looking a bit deeper into this, I'm wondering if in fact your
> EstimateStringSize() and EstimateNodeSize() functions should be using
> BUFFERALIGN() for EACH stored string/node (rather than just calling
> shm_toc_estimate_chunk() once at the end, after the length of packed
> strings and nodes has been estimated), to ensure alignment of start of
> each string/node. Other Postgres code appears to be aligning each
> stored chunk using shm_toc_estimate_chunk(). See the definition of
> that macro and its current usages.
>

I'm not handling this, this is similar to how it is handled in other places.

> Then you could safely use:
>
> uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
> *(uint16 *)destptr = len;
> *copiedsize += sizeof(uint16);
> if (len)
> {
> memcpy(destptr + sizeof(uint16), srcPtr, len);
> *copiedsize += len;
> }
>
> and in the CopyStringFromSharedMemory() function, then could safely use:
>
> len = *(uint16 *)srcPtr;
>
> The compiler may be smart enough to optimize-away the memcpy() in this
> case anyway, but there are issues in doing this for architectures that
> take a performance hit for unaligned access, or don't support
> unaligned access.

Changed it to uin32, so that there are no issues in case if length exceeds
65535 & also to avoid problems in Big Endian architecture.

> Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
> instead of palloc0(), as you're filling the entire palloc'd buffer
> anyway, so no need to ask for additional MemSet() of all buffer bytes
> to 0 prior to memcpy().
>

I have changed palloc0 to palloc.

Thanks Greg for reviewing & providing your comments. These changes are
fixed in one of my earlier mail [1] that I sent.
[1]
https://www.postgresql.org/message-id/CALDaNm1n1xW43neXSGs%3Dc7zt-mj%2BJHHbubWBVDYT9NfCoF8TuQ%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-10-14 14:07:54 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message vignesh C 2020-10-14 13:29:48 Re: Parallel copy