From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add new COPY option REJECT_LIMIT |
Date: | 2024-07-04 03:05:25 |
Message-ID: | de0551d0f9d9e9072e324e51ed5c426d@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-07-03 02:07, Fujii Masao wrote:
Thanks for your comments!
> On 2024/01/26 18:49, torikoshia wrote:
>> Hi,
>>
>> 9e2d870 enabled the COPY command to skip soft error, and I think we
>> can add another option which specifies the maximum tolerable number of
>> soft errors.
>>
>> I remember this was discussed in [1], and feel it would be useful when
>> loading 'dirty' data but there is a limit to how dirty it can be.
>>
>> Attached a patch for this.
>>
>> What do you think?
>
> The patch no longer applies cleanly to HEAD. Could you update it?
I'm going to update it after discussing the option format as described
below.
>
> I think the REJECT_LIMIT feature is useful. Allowing it to be set as
> either the absolute number of skipped rows or a percentage of the
> total input rows is a good idea.
>
> However, if we support REJECT_LIMIT, I'm not sure if the ON_ERROR
> option is still necessary. REJECT_LIMIT seems to cover the same cases.
> For instance, REJECT_LIMIT=infinity can act like ON_ERROR=ignore, and
> REJECT_LIMIT=0 can act like ON_ERROR=stop.
I agree that it's possible to use only REJECT_LIMIT without ON_ERROR.
I also think it's easy to understand that REJECT_LIMIT=0 is
ON_ERROR=stop.
However, expressing REJECT_LIMIT='infinity' needs some definition like
"setting REJECT_LIMIT to -1 means 'infinity'", doesn't it? If so, I
think this might not so intuitive.
Also, since it seems Snowflake and Redshift have both options equivalent
to REJECT_LIMIT and ON_ERROR, having both of them in PostgreSQL COPY
might not be surprising:
- Snowflake's ON_ERROR accepts "CONTINUE | SKIP_FILE | SKIP_FILE_num |
'SKIP_FILE_num%' | ABORT_STATEMENT"[1]
- Redshift has MAXERROR and IGNOREALLERRORS options[2]
BTW after seeing Snowflake makes SKIP_FILE_num one of the options of
ON_ERROR, I'm a bit wondering whether REJECT_LIMIT also should be the
same.
[1]
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#copy-options-copyoptions
[2]
https://docs.aws.amazon.com/en_en/redshift/latest/dg/copy-parameters-data-load.html
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-07-04 03:10:34 | Re: Cannot find a working 64-bit integer type on Illumos |
Previous Message | Alexander Korotkov | 2024-07-04 03:04:19 | Re: Removing unneeded self joins |