From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-06-24 08:46:23 |
Message-ID: | CALj2ACUj1303Pvhhnp_J6+BG0d9Fp1bRDCKL=pQThJ4+8HAGNw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
It looks like the parsing of newly introduced "PARALLEL" option for
COPY FROM command has an issue(in the
0002-Framework-for-leader-worker-in-parallel-copy.patch),
Mentioning ....PARALLEL '4ar2eteid'); would pass with 4 workers since
atoi() is being used for converting string to integer which just
returns 4, ignoring other strings.
I used strtol(), added error checks and introduced the error "
improper use of argument to option "parallel"" for the above cases.
parallel '4ar2eteid');
ERROR: improper use of argument to option "parallel"
LINE 5: parallel '1\');
Along with the updated patch
0002-Framework-for-leader-worker-in-parallel-copy.patch, also
attaching all the latest patches from [1].
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 12:22 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > I have attached the patch for the same with the fixes.
>
> The patches were not applying on the head, attached the patches that can be applied on head.
> I have added a commitfest entry[1] for this feature.
>
> [1] - https://commitfest.postgresql.org/28/2610/
>
>
> On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>>
>> Thanks Ashutosh For your review, my comments are inline.
>> On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>> >
>> > Hi,
>> >
>> > I just got some time to review the first patch in the list i.e. 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch name suggests, it is just trying to reshuffle the existing code for COPY command here and there. There is no extra changes added in the patch as such, but still I do have some review comments, please have a look:
>> >
>> > 1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in detail. Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from BeginCopy() to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to give it some better name, basically something that matches with what actually it is doing.
>> >
>>
>> There is no new code added in this function, some part of code from
>> BeginCopy was made in to a new function as this part of code will also
>> be required for the parallel copy workers before the workers start the
>> actual copy operation. This code was made into a function to avoid
>> duplication. Changed the function name to PopulateGlobalsForCopyFrom &
>> added few comments.
>>
>> > 2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it appears as if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check for the target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more sensible, I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and variable names :)
>> >
>>
>> Changed as suggested.
>> > 3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity check for the target relation.
>> >
>>
>> I felt there is reasonable number of lines in the function & it is not
>> in performance intensive path, so I preferred function over macro.
>> Your thoughts?
>>
>> > 4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data), we were not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your patch checks that before clearing the EOL marker. Any reason for this extra check?
>> >
>>
>> If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
>> nothing for the header line, server basically calls CopyReadLine
>> again, it is a kind of small optimization. Anyway server is not going
>> to do anything with header line, I felt no need to clear EOL marker
>> for header lines.
>> /* on input just throw the header line away */
>> if (cstate->cur_lineno == 0 && cstate->header_line)
>> {
>> cstate->cur_lineno++;
>> if (CopyReadLine(cstate))
>> return false; /* done */
>> }
>>
>> cstate->cur_lineno++;
>>
>> /* Actually read the line into memory here */
>> done = CopyReadLine(cstate);
>> I think no need to make a fix for this. Your thoughts?
>>
>> > 5) I noticed the below spurious line removal in the patch.
>> >
>> > @@ -3839,7 +3953,6 @@ static bool
>> > CopyReadLine(CopyState cstate)
>> > {
>> > bool result;
>> > -
>> >
>>
>> Fixed.
>> I have attached the patch for the same with the fixes.
>> Thoughts?
>>
>> Regards,
>> Vignesh
>> EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | movead.li@highgo.ca | 2020-06-24 08:49:31 | Re: pg_resetwal --next-transaction-id may cause database failed to restart. |
Previous Message | Magnus Hagander | 2020-06-24 08:46:17 | Re: should libpq also require TLSv1.2 by default? |