Re: Add new COPY option REJECT_LIMIT

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-07-05 07:13:10
Message-ID: 979c6d306343f3d665d14eab2cd1b7bc@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-07-05 12:59, Fujii Masao wrote:
> On 2024/07/04 12:05, torikoshia wrote:
>> I'm going to update it after discussing the option format as described
>> below.
>
> Thanks!
>
>> I agree that it's possible to use only REJECT_LIMIT without ON_ERROR.
>> I also think it's easy to understand that REJECT_LIMIT=0 is
>> ON_ERROR=stop.
>> However, expressing REJECT_LIMIT='infinity' needs some definition like
>> "setting REJECT_LIMIT to -1 means 'infinity'", doesn't it? If so, I
>> think this might not so intuitive.
>
> How about allowing REJECT_LIMIT to accept the keywords "infinity",
> "unlimited",
> or "all" in addition to a number? This way, users can specify one of
> these
> keywords instead of -1 to ignore all errors. The server code would then
> internally set the REJECT_LIMIT to -1 or another appropriate value when
> these keywords are used, but users wouldn't need to worry about this
> detail.

Agreed.

> If we choose "all" as the keyword, renaming the option to IGNORE_ERRORS
> might be more intuitive and easier to understand than REJECT_LIMIT.

I feel that 'infinite' and 'unlimited' are unfamiliar values for
PostgreSQL parameters, so 'all' might be better and IGNORE_ERRORS would
be a better parameter name as your suggestion.

>> Also, since it seems Snowflake and Redshift have both options
>> equivalent to REJECT_LIMIT and ON_ERROR, having both of them in
>> PostgreSQL COPY might not be surprising:
>> - Snowflake's ON_ERROR accepts "CONTINUE | SKIP_FILE | SKIP_FILE_num |
>> 'SKIP_FILE_num%' | ABORT_STATEMENT"[1]
>> - Redshift has MAXERROR and IGNOREALLERRORS options[2]
>
> Ok, so here's a summary of the options and their behaviors:
>
> To ignore all errors and continue to the end:
>
> - Snowflake: ON_ERROR=CONTINUE
> - Redshift: IGNOREALLERRORS
> - Postgres (with your patch): ON_ERROR=ignore
> - Postgres (with my idea): IGNORE_ERRORS=all
>
> To fail as soon as an error is found:
>
> - Snowflake: ON_ERROR=ABORT_STATEMENT (default) / SKIP_FILE
> - Redshift: MAXERROR=0 (default)
> - Postgres (with your patch): ON_ERROR=stop (default)
> - Postgres (with my idea): IGNORE_ERRORS=0 (default)
>
> To fail when NNN or more errors are found:
>
> - Snowflake: ON_ERROR=SKIP_FILE_NNN
> - Redshift: MAXERROR=NNN
> - Postgres (with your patch): REJECT_LIMIT=NNN-1 and ON_ERROR=ignore
> - Postgres (with my idea): IGNORE_ERRORS=NNN

Thanks for the summary.

> This makes me think it might be better to treat REJECT_LIMIT as
> an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
> if we adopt your patch. Since ON_ERROR=stop is the default,
> users could set the maximum number of allowed errors by specifying
> only REJECT_LIMIT. Otherwise, they would need to specify both
> ON_ERROR=ignore and REJECT_LIMIT.

That makes sense.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-05 08:06:27 Re: Parent/child context relation in pg_get_backend_memory_contexts()
Previous Message 杨伯宇 (长堂) 2024-07-05 07:12:37 speed up pg_upgrade with large number of tables