From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add new COPY option REJECT_LIMIT |
Date: | 2024-07-17 13:21:07 |
Message-ID: | 920a29a36032befbeb355cbdfbbe0f63@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-07-03 02:07, Fujii Masao wrote:
> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
> option is still necessary.
I remembered another reason for the necessity of ON_ERROR.
ON_ERROR defines how to behave when encountering an error and it just
accepts 'ignore' and 'stop' currently, but is expected to support other
options such as saving details of errors to a table[1].
ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine
REJECT_LIMIT would not replace future options of ON_ERROR.
Considering this and the option we want to add this time is to specify
an upper limit on the number or ratio of errors, the name of this option
like "reject_limit" seems better than "ignore_errors".
On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
wrote:
> On 2024-07-05 12:59, Fujii Masao wrote:
>> On 2024/07/04 12:05, torikoshia wrote:
>>> I'm going to update it after discussing the option format as
>>> described
>>> below.
Updated the patch.
0001 sets limit by the absolute number of error rows and 0002 sets limit
by ratio of the error.
>> If we choose "all" as the keyword, renaming the option to
>> IGNORE_ERRORS
>> might be more intuitive and easier to understand than REJECT_LIMIT.
> I feel that 'infinite' and 'unlimited' are unfamiliar values for
> PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
> be a better parameter name as your suggestion.
As described above, attached patch adopts REJECT_LIMIT, so it uses
"infinity".
>> This makes me think it might be better to treat REJECT_LIMIT as
>> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
>> if we adopt your patch. Since ON_ERROR=stop is the default,
>> users could set the maximum number of allowed errors by specifying
>> only REJECT_LIMIT. Otherwise, they would need to specify both
>> ON_ERROR=ignore and REJECT_LIMIT.
> That makes sense.
On my second thought, whatever value ON_ERROR is specified(e.g. ignore,
stop, table), it seems fine to use REJECT_LIMIT.
I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
meaning it ignores errors until it reaches REJECT_LIMIT and stops when
it exceeds the REJECT_LIMIT.
And REJECT_LIMIT seems orthogonal to 'table', which specifies where to
save error details.
Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR
option value.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Add-new-COPY-option-REJECT_LIMIT_number.patch | text/x-diff | 9.7 KB |
v2-0002-Add-new-COPY-option-REJECT_LIMIT_ratio.patch | text/x-diff | 7.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Pyhalov | 2024-07-17 13:24:28 | Asynchronous MergeAppend |
Previous Message | Alena Rybakina | 2024-07-17 12:53:24 | Re: POC, WIP: OR-clause support for indexes |