Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: 2024-11-19 20:02:49
Message-ID: CALdSSPgLVBHNr97pYgmVfi+YdH4bqwk9YVvbuvhJ4sCr7eEVOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Nov 2024, 13:52 jian he, <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Sat, Nov 16, 2024 at 5:55 PM Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> >
> > I am attaching my v8 for reference.
> >
>
> in your v8.
>
> <varlistentry>
> <term><literal>REJECT_LIMIT</literal></term>
> <listitem>
> <para>
> Specifies the maximum number of errors tolerated while converting a
> column's input value to its data type, when <literal>ON_ERROR</literal> is
> set to <literal>ignore</literal>.
> If the input contains more erroneous rows than the specified
> value, the <command>COPY</command>
> command fails, even with <literal>ON_ERROR</literal> set to
> <literal>ignore</literal>.
> </para>
> </listitem>
> </varlistentry>
>
> then above description not meet with following example, (i think)
>
> create table t(a int not null);
> COPY t FROM STDIN WITH (on_error set_to_null, reject_limit 2);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a
> >> \.
> ERROR: null value in column "a" of relation "t" violates not-null constraint
> DETAIL: Failing row contains (null).
> CONTEXT: COPY t, line 1, column a: "a"

Sure, my v8 does not helps with column level NOT NULL constraint (or
other constraint)

> Overall, I think
> making the domain not-null align with column level not-null would be a
> good thing.

While this looks sane, it's actually a separate topic. Even on current
HEAD we have domain not-null vs column level not-null unalignment.

consider this example

```
reshke=# create table ftt2 (i int not null);
CREATE TABLE
reshke=# copy ftt2 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> \N
>> \.
ERROR: null value in column "i" of relation "ftt2" violates not-null constraint
DETAIL: Failing row contains (null).
CONTEXT: COPY ftt2, line 1: "\N"
reshke=# create domain dd as int not null ;
CREATE DOMAIN
reshke=# create table ftt3(i dd);
CREATE TABLE
reshke=# copy ftt3 from stdin with (reject_limit 1000, on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> \N
>> \.
NOTICE: 1 row was skipped due to data type incompatibility
COPY 0
reshke=#
```
So, if we want this, we need to start another thread and deal with
REJECT_LIMIT + on_error ignore first.

The ExecConstraints function is the source of the error in scenario 1.
Therefore, we require something like "ExecConstraintsSafe" to
accommodate the aforementioned. That is a significant change. Not sure
it will be accepted by the community.

>
> <para>
> Specifies how to behave when encountering an error converting a column's
> input value into its data type.
> An <replaceable class="parameter">error_action</replaceable> value of
> <literal>stop</literal> means fail the command,
> <literal>ignore</literal> means discard the input row and
> continue with the next one, and
> <literal>set_to_null</literal> means replace columns containing
> erroneous input values with <literal>null</literal> and move to the
> next row.
>
> "and move to the next row" is wrong?
> I think it should be " and move to the next field".

Yes, "and move to the next field" is correct.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2024-11-19 20:47:20 Re: Statistics Import and Export
Previous Message Tom Lane 2024-11-19 19:45:16 Re: Using Expanded Objects other than Arrays from plpgsql