Re: Add new COPY option REJECT_LIMIT

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-07-17 13:21:07
Message-ID: 920a29a36032befbeb355cbdfbbe0f63@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-07-03 02:07, Fujii Masao wrote:
> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
> option is still necessary.

I remembered another reason for the necessity of ON_ERROR.

ON_ERROR defines how to behave when encountering an error and it just
accepts 'ignore' and 'stop' currently, but is expected to support other
options such as saving details of errors to a table[1].
ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine
REJECT_LIMIT would not replace future options of ON_ERROR.

Considering this and the option we want to add this time is to specify
an upper limit on the number or ratio of errors, the name of this option
like "reject_limit" seems better than "ignore_errors".

On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
wrote:
> On 2024-07-05 12:59, Fujii Masao wrote:
>> On 2024/07/04 12:05, torikoshia wrote:
>>> I'm going to update it after discussing the option format as
>>> described
>>> below.

Updated the patch.
0001 sets limit by the absolute number of error rows and 0002 sets limit
by ratio of the error.

>> If we choose "all" as the keyword, renaming the option to
>> IGNORE_ERRORS
>> might be more intuitive and easier to understand than REJECT_LIMIT.

> I feel that 'infinite' and 'unlimited' are unfamiliar values for
> PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
> be a better parameter name as your suggestion.

As described above, attached patch adopts REJECT_LIMIT, so it uses
"infinity".

>> This makes me think it might be better to treat REJECT_LIMIT as
>> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
>> if we adopt your patch. Since ON_ERROR=stop is the default,
>> users could set the maximum number of allowed errors by specifying
>> only REJECT_LIMIT. Otherwise, they would need to specify both
>> ON_ERROR=ignore and REJECT_LIMIT.

> That makes sense.

On my second thought, whatever value ON_ERROR is specified(e.g. ignore,
stop, table), it seems fine to use REJECT_LIMIT.
I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
meaning it ignores errors until it reaches REJECT_LIMIT and stops when
it exceeds the REJECT_LIMIT.
And REJECT_LIMIT seems orthogonal to 'table', which specifies where to
save error details.

Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR
option value.

[1]
https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q(at)mail(dot)gmail(dot)com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment Content-Type Size
v2-0001-Add-new-COPY-option-REJECT_LIMIT_number.patch text/x-diff 9.7 KB
v2-0002-Add-new-COPY-option-REJECT_LIMIT_ratio.patch text/x-diff 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2024-07-17 13:24:28 Asynchronous MergeAppend
Previous Message Alena Rybakina 2024-07-17 12:53:24 Re: POC, WIP: OR-clause support for indexes