Re: Add new COPY option REJECT_LIMIT

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add new COPY option REJECT_LIMIT
Date: 2024-07-05 03:59:11
Message-ID: 6e3dcdc6-8e18-4716-a23b-3bee316aeaa7@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024/07/04 12:05, torikoshia wrote:
> I'm going to update it after discussing the option format as described below.

Thanks!

> I agree that it's possible to use only REJECT_LIMIT without ON_ERROR.
> I also think it's easy to understand that REJECT_LIMIT=0 is ON_ERROR=stop.
> However, expressing REJECT_LIMIT='infinity' needs some definition like "setting REJECT_LIMIT to -1 means 'infinity'", doesn't it? If so, I think this might not so intuitive.

How about allowing REJECT_LIMIT to accept the keywords "infinity", "unlimited",
or "all" in addition to a number? This way, users can specify one of these
keywords instead of -1 to ignore all errors. The server code would then
internally set the REJECT_LIMIT to -1 or another appropriate value when
these keywords are used, but users wouldn't need to worry about this detail.

If we choose "all" as the keyword, renaming the option to IGNORE_ERRORS
might be more intuitive and easier to understand than REJECT_LIMIT.

> Also, since it seems Snowflake and Redshift have both options equivalent to REJECT_LIMIT and ON_ERROR, having both of them in PostgreSQL COPY might not be surprising:
> - Snowflake's ON_ERROR accepts "CONTINUE | SKIP_FILE | SKIP_FILE_num | 'SKIP_FILE_num%' | ABORT_STATEMENT"[1]
> - Redshift has MAXERROR and IGNOREALLERRORS options[2]

Ok, so here's a summary of the options and their behaviors:

To ignore all errors and continue to the end:

- Snowflake: ON_ERROR=CONTINUE
- Redshift: IGNOREALLERRORS
- Postgres (with your patch): ON_ERROR=ignore
- Postgres (with my idea): IGNORE_ERRORS=all

To fail as soon as an error is found:

- Snowflake: ON_ERROR=ABORT_STATEMENT (default) / SKIP_FILE
- Redshift: MAXERROR=0 (default)
- Postgres (with your patch): ON_ERROR=stop (default)
- Postgres (with my idea): IGNORE_ERRORS=0 (default)

To fail when NNN or more errors are found:

- Snowflake: ON_ERROR=SKIP_FILE_NNN
- Redshift: MAXERROR=NNN
- Postgres (with your patch): REJECT_LIMIT=NNN-1 and ON_ERROR=ignore
- Postgres (with my idea): IGNORE_ERRORS=NNN

This makes me think it might be better to treat REJECT_LIMIT as
an additional option for ON_ERROR=stop instead of ON_ERROR=ignore
if we adopt your patch. Since ON_ERROR=stop is the default,
users could set the maximum number of allowed errors by specifying
only REJECT_LIMIT. Otherwise, they would need to specify both
ON_ERROR=ignore and REJECT_LIMIT.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-07-05 04:00:00 Re: Use generation memory context for tuplestore.c
Previous Message Michael Paquier 2024-07-05 01:13:06 Re: Add pg_get_acl() function get the ACL for a database object