Re: Add new COPY option REJECT_LIMIT

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-07-19 14:48:31
Message-ID: CAEG8a3JPdGkYKXadAEOphAoiUkY3hHZagk9=UPnyOq386e6gYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Torikoshia,

On Wed, Jul 17, 2024 at 9:21 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-07-03 02:07, Fujii Masao wrote:
> > However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
> > option is still necessary.
>
> I remembered another reason for the necessity of ON_ERROR.
>
> ON_ERROR defines how to behave when encountering an error and it just
> accepts 'ignore' and 'stop' currently, but is expected to support other
> options such as saving details of errors to a table[1].
> ON_ERROR=stop is a synonym for REJECT_LIMIT=infinity, but I imagine
> REJECT_LIMIT would not replace future options of ON_ERROR.
>
> Considering this and the option we want to add this time is to specify
> an upper limit on the number or ratio of errors, the name of this option
> like "reject_limit" seems better than "ignore_errors".
>
> On Fri, Jul 5, 2024 at 4:13 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
> > On 2024-07-05 12:59, Fujii Masao wrote:
> >> On 2024/07/04 12:05, torikoshia wrote:
> >>> I'm going to update it after discussing the option format as
> >>> described
> >>> below.
>
> Updated the patch.
> 0001 sets limit by the absolute number of error rows and 0002 sets limit
> by ratio of the error.

In patch 0002, the ratio is calculated by the already skipped/processed
rows, but what if a user wants to copy 1000 rows, and he/she can tolerate
10 error rows, so he/she might set *reject_limit 0.01*, but one bad row in the
first 100 rows will fail the entire command, this might surprise the user.

This case can be resolved by 0001 *reject_limit 10*, so I think the *by ratio*
is less useful.

>
> >> If we choose "all" as the keyword, renaming the option to
> >> IGNORE_ERRORS
> >> might be more intuitive and easier to understand than REJECT_LIMIT.
>
> > I feel that 'infinite' and 'unlimited' are unfamiliar values for
> > PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
> > be a better parameter name as your suggestion.
>
> As described above, attached patch adopts REJECT_LIMIT, so it uses
> "infinity".
>
> >> This makes me think it might be better to treat REJECT_LIMIT as
> >> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
> >> if we adopt your patch. Since ON_ERROR=stop is the default,
> >> users could set the maximum number of allowed errors by specifying
> >> only REJECT_LIMIT. Otherwise, they would need to specify both
> >> ON_ERROR=ignore and REJECT_LIMIT.
>
> > That makes sense.
>
> On my second thought, whatever value ON_ERROR is specified(e.g. ignore,
> stop, table), it seems fine to use REJECT_LIMIT.
> I feel REJECT_LIMIT has both "ignore" and "stop" characteristics,
> meaning it ignores errors until it reaches REJECT_LIMIT and stops when
> it exceeds the REJECT_LIMIT.
> And REJECT_LIMIT seems orthogonal to 'table', which specifies where to
> save error details.
>
> Attached patch allows using REJECT_LIMIT regardless of the ON_ERROR
> option value.
>
>
> [1]
> https://www.postgresql.org/message-id/flat/CACJufxH_OJpVra=0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q(at)mail(dot)gmail(dot)com
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation

--
Regards
Junwang Zhao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-07-19 14:52:45 Re: Incremental backup from a streaming replication standby fails
Previous Message David G. Johnston 2024-07-19 14:47:48 Re: behavior of GROUP BY with VOLATILE expressions