Re: Add new COPY option REJECT_LIMIT

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: zhjwpku(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-09-26 13:38:18
Message-ID: dd542f3ce7e15cde7e53824041f2a2f4@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-09-25 02:22, Fujii Masao wrote:
Thanks for your review!

Attached v5 patch.

> On 2024/09/24 14:25, torikoshia wrote:
>> Updated the patch.
>
> Thanks for updating the patch!
>
> + REJECT_LIMIT { <replaceable
> class="parameter">integer</replaceable> }
>
> The curly braces {} seem unnecessary here.
>
> + When a positive integer value is specified,
> <command>COPY</command> limits
> + the maximum tolerable number of errors while converting a
> column's input
> + value into its data type.
> + If input data caused more errors than the specified value,
> entire
> + <command>COPY</command> fails.
> + Otherwise, <command>COPY</command> discards the input row and
> continues
> + with the next one.
> + This option must be used with <literal>ON_ERROR</literal> to be
> set to
> + <literal>ignore</literal>.
> + Just setting <literal>ON_ERROR</literal> to
> <literal>ignore</literal>
> + tolerates unlimited number of errors.
>
> Regarding the description of REJECT_LIMIT, how about clarifying what
> the option specifies up front, like this:
>
> ----------------
> Specifies the maximum number of errors tolerated while converting a
> column's
> input value to its data type, when on_error is set to "ignore." If the
> input
> causes more errors than the specified value, the COPY command fails,
> even with on_error set to "ignore." This value must be positive and
> used with
> on_error="ignore". If not specified, on_error="ignore" allows an
> unlimited
> number of errors, meaning COPY will skip all erroneous data.
> ----------------

Thanks for writing it. It seems better.

> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("number for REJECT_LIMIT must be greater than zero")));
> + }
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("value for REJECT_LIMIT must be positive integer")));
>
> The phrases "number for" and "value for" seem unnecessary in the error
> messages.
> Also, "positive number" should be "a positive number." Would it be
> better to display
> the actual specified value in the error message, like:
> errmsg("REJECT_LIMIT (%s) must
> be a positive number", value)?

Agreed.

> Lastly, during my testing, if nothing is specified for REJECT_LIMIT
> (e.g., COPY ... WITH (..., REJECT_LIMIT)), the COPY command caused a
> segmentation fault.

Oh, thanks for finding it. Fixed.
>
>
>>> I'd like to hear if anyone has an opinion on the need for supporting
>>> ratio.
>>
>> Since there was no opinion about it, attached a patch only for
>> REJECT_LIMIT specifying number.
>
> +1
>
>> As there are no opposing opinions, removed 'INFINITY' as well.
>
> +1
>
> Regards,

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment Content-Type Size
v5-0001-Add-new-COPY-option-REJECT_LIMIT.patch text/x-diff 8.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Francesco Degrassi 2024-09-26 14:15:56 RFC/PoC: GUC option to enable tuple queue autoflush for parallel workers
Previous Message Florents Tselai 2024-09-26 12:59:51 Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part