Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: 2024-10-25 21:03:32
Message-ID: CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Mon, 21 Oct 2024 at 17:39, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2024/10/21 18:30, Kirill Reshke wrote:
> > v4 no longer applies. It now conflicts with
> > e7834a1a251d4a28245377f383ff20a657ba8262.
> > Also, there were review comments.
> >
> > So, I decided to rebase.
>
> Thanks for the patch! Here are my review comments:
>
> I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
> and columns with errors. It's "unexpected" thing for columns to be silently
> replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
> there should be NOTICE messages indicating which input records had columns
> set to NULL because of data type incompatibility. Without these messages,
> users might not realize that some columns were set to NULL.

Nice catch. That is a feature introduced by
f5a227895e178bf528b18f82bbe554435fb3e64f.

>
> How should on_error=set_to_null behave when reject_limit is set? It seems
> intuitive to trigger an error if the number of rows with columns' data type
> issues exceeds reject_limit, similar to on_error=ignore. This is open to discussion.

Ok, let's discuss. My first suggestion was:

when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf. But our REJECT LIMIT is zero
(not set).
So, we ignore zero REJECT LIMIT if set_to_null is set.

But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.

>
> psql's tab-completion should be updated to include SET_TO_NULL.

Agreed.

>
>
> An <replaceable class="parameter">error_action</replaceable> value of
> <literal>stop</literal> means fail the command, while
> <literal>ignore</literal> means discard the input row and continue with the next one.
> + <literal>set_to_null</literal> means the input value will be set to <literal>null</literal> and continue with the next one.
>
> How about merging these two descriptions to one and updating it to the following?
>
> -------------------
> An error_action value of stop means fail the command, ignore means discard the input
> row and continue with the next one, and set_to_null means replace columns with invalid
> input values with NULL and move to the next row.
> -------------------
>

Hm, good catch. Applied almost as you suggested. I did tweak this
"replace columns with invalid input values with " into "replace
columns containing erroneous input values with". Is that OK?

> The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
>
> This should be "... ignore and set_to_null options are ..."?

Sure!

> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>

Here I sent v6 where review comments, except set_to_null vs reject
limit, were addressed. No new tests added yet, as some details are
unclear...

[1] https://github.com/postgresql-cfbot/postgresql/compare/cf/4810~1...cf/4810#diff-98d8bfd706468f77f8b0d5d0797e3dba3ffaaa88438143ef4cf7fedecaa56827R964
--
Best regards,
Kirill Reshke

Attachment Content-Type Size
v6-0001-on_error-set_to_null.patch application/octet-stream 13.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-10-25 21:31:38 Re: Fix for consume_xids advancing XIDs incorrectly
Previous Message Pavel Stehule 2024-10-25 20:38:01 Re: proposal: schema variables