Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: 2024-11-09 12:55:04
Message-ID: CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2024/10/26 6:03, Kirill Reshke wrote:
> > when the REJECT LIMIT is set to some non-zero number and the number of
> > row NULL replacements exceeds the limit, is it OK to fail. Because
> > there WAS errors, and we should not tolerate more than $limit errors .
> > I do find this behavior to be consistent.
>
> +1
>
>
> > But what if we don't set a REJECT LIMIT, it is sane to do all
> > replacements, as if REJECT LIMIT is inf.
>
> +1

After thinking for a while, I'm now more opposed to this approach. I
think we should count rows with erroneous data as errors only if
null substitution for these rows failed, not the total number of rows
which were modified.
Then, to respect the REJECT LIMIT option, we compare this number with
the limit. This is actually simpler approach IMHO. What do You think?

> > But while I was trying to implement that, I realized that I don't
> > understand v4 of this patch. My misunderstanding is about
> > `t_on_error_null` tests. We are allowed to insert a NULL value for the
> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
> > do we do that? My thought is we should try to execute
> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the
> > column after we failed to insert the input value. And, if this second
> > call is successful, we do replacement, otherwise we count the row as
> > erroneous.
>
> Your concern is valid. Allowing NULL to be stored in a column with a NOT NULL
> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you suggested,
> NULL values set by SET_TO_NULL should probably be re-evaluated.

Thank you. I updated the patch with a NULL re-evaluation.

PFA v7. I did not yet update the doc for this patch version, waiting
for feedback about REJECT LIMIT + SET_TO_NULL behaviour.

Best regards,
Kirill Reshke

Attachment Content-Type Size
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch application/octet-stream 20.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2024-11-09 14:08:51 Re: New GUC autovacuum_max_threshold ?
Previous Message Peter Eisentraut 2024-11-09 12:43:13 Re: Proper object locking for GRANT/REVOKE