Re: Add new COPY option REJECT_LIMIT

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, zhjwpku(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-07-22 12:37:03
Message-ID: f9feda0ecd4e2e09620a6f9ca8f4aac1@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2024 at 11:48 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
Thanks for the comment.

> In patch 0002, the ratio is calculated by the already skipped/processed
> rows, but what if a user wants to copy 1000 rows, and he/she can
> tolerate
> 10 error rows, so he/she might set *reject_limit 0.01*, but one bad row
> in the
> first 100 rows will fail the entire command, this might surprise the
> user.

Since the ratio is calculated after all data is processed, the case "one
bad row in the first 100 rows will fail the entire command" doesn't
happen:

=# \! wc -l 1000rows-with-10err.data
1000 1000rows-with-10err.data

=# COPY t1 from '1000rows-with-10err.data' with (log_verbosity
verbose, reject_limit 0.01);
NOTICE: skipping row due to data type incompatibility at line 10 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 11 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 12 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 13 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 14 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 15 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 16 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 17 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 18 for
column i: "a"
NOTICE: skipping row due to data type incompatibility at line 19 for
column i: "a"
NOTICE: 10 rows were skipped due to data type incompatibility
COPY 990

On 2024-07-20 02:08, Fujii Masao wrote:
> On 2024/07/19 22:03, Fujii Masao wrote:
>>
>>
>> On 2024/07/17 22:21, torikoshia wrote:
>>> On 2024-07-03 02:07, Fujii Masao wrote:
>>>> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
>>>> option is still necessary.
>>>
>>> I remembered another reason for the necessity of ON_ERROR.
>>>
>>> ON_ERROR defines how to behave when encountering an error and it just
>>> accepts 'ignore' and 'stop' currently, but is expected to support
>>> other options such as saving details of errors to a table[1].
>>
>> Wouldn't it be better to separate the option specifying where
>> error details are output from the ON_ERROR option
>> (which determines behavior when encountering errors)?
>> "table" seems valid for both ON_ERROR=ignore and ON_ERROR=stop.
>
> I still find it odd to accept "table" as a value for ON_ERROR. However,
> "set_to_null" or "replace-column" proposed in [1] seem valid for
> ON_ERROR. So, I'm okay with keeping the ON_ERROR option.

Agreed.

>> On my second thought, whatever value ON_ERROR is specified(e.g.
>> ignore, stop, table), it seems fine to use REJECT_LIMIT.
>> I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
>> meaning it ignores errors until it reaches REJECT_LIMIT and stops when
>> it exceeds the REJECT_LIMIT.
>
> ON_ERROR specifies how to handle errors, and "stop" means to fail
> the command. So, if ON_ERROR=stop, REJECT_LIMIT should have no effect,
> and the command should fail immediately upon encountering an error.
>
> As in your original proposal, I now think REJECT_LIMIT should only
> apply when ON_ERROR=ignore. The command would ignore errors and
> continue processing, but if the number of errors exceeds REJECT_LIMIT,
> the command should fail. Thought?

Makes sense.
Updated the patch.

> BTW if "set_to_null" is supported someday, REJECT_LIMIT can also
> apply. The command would cinsert NULL into the target table upon
> encountering errors and continue, but fail if the number of errors
> exceed REJECT_LIMIT.

Agreed.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment Content-Type Size
v3-0001-Add-new-COPY-option-REJECT_LIMIT-number.patch text/x-diff 10.0 KB
v3-0002-Add-new-COPY-option-REJECT_LIMIT-ratio.patch text/x-diff 7.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-07-22 12:39:23 Re: Lock-free compaction. Why not?
Previous Message Tomas Vondra 2024-07-22 12:36:40 Re: Make COPY format extendable: Extract COPY TO format implementations