Re: Add new COPY option REJECT_LIMIT

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: zhjwpku(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-09-24 17:22:52
Message-ID: a0343e25-644e-4435-8399-f2e74c04ca23@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024/09/24 14:25, torikoshia wrote:
> Updated the patch.

Thanks for updating the patch!

+ REJECT_LIMIT { <replaceable class="parameter">integer</replaceable> }

The curly braces {} seem unnecessary here.

+ When a positive integer value is specified, <command>COPY</command> limits
+ the maximum tolerable number of errors while converting a column's input
+ value into its data type.
+ If input data caused more errors than the specified value, entire
+ <command>COPY</command> fails.
+ Otherwise, <command>COPY</command> discards the input row and continues
+ with the next one.
+ This option must be used with <literal>ON_ERROR</literal> to be set to
+ <literal>ignore</literal>.
+ Just setting <literal>ON_ERROR</literal> to <literal>ignore</literal>
+ tolerates unlimited number of errors.

Regarding the description of REJECT_LIMIT, how about clarifying what the option specifies up front, like this:

----------------
Specifies the maximum number of errors tolerated while converting a column's
input value to its data type, when on_error is set to "ignore." If the input
causes more errors than the specified value, the COPY command fails,
even with on_error set to "ignore." This value must be positive and used with
on_error="ignore". If not specified, on_error="ignore" allows an unlimited
number of errors, meaning COPY will skip all erroneous data.
----------------

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number for REJECT_LIMIT must be greater than zero")));
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value for REJECT_LIMIT must be positive integer")));

The phrases "number for" and "value for" seem unnecessary in the error messages.
Also, "positive number" should be "a positive number." Would it be better to display
the actual specified value in the error message, like: errmsg("REJECT_LIMIT (%s) must
be a positive number", value)?

Lastly, during my testing, if nothing is specified for REJECT_LIMIT
(e.g., COPY ... WITH (..., REJECT_LIMIT)), the COPY command caused a segmentation fault.

>> I'd like to hear if anyone has an opinion on the need for supporting ratio.
>
> Since there was no opinion about it, attached a patch only for REJECT_LIMIT specifying number.

+1

> As there are no opposing opinions, removed 'INFINITY' as well.

+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-09-24 17:24:56 Re: Conflict detection for update_deleted in logical replication
Previous Message Tom Lane 2024-09-24 17:01:04 Re: Cleaning up ERRCODE usage in our XML code