Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: jian he <jian(dot)universality(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: 2025-03-11 10:31:06
Message-ID: 90dc6e9d-9348-485a-b27c-7b1637f06c2e@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jian

On 07.03.25 04:48, jian he wrote:
> hi.
> rebase only.

I revisited this patch today. It applies and builds cleanly, and it
works as expected.

Some tests and minor comments:

====

1) WARNING might be a better fit than NOTICE here.

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
  x   |  y   |  z   
------+------+------
    1 | NULL | NULL
    2 |    1 | NULL
    3 |    2 | NULL
    4 | NULL | NULL
 NULL | NULL | NULL
(5 rows)

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format
csv, log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  column "y" was set to null due to data type incompatibility at
line 1
NOTICE:  column "y" was set to null due to data type incompatibility at
line 4
NOTICE:  column "x" was set to null due to data type incompatibility at
line 5
NOTICE:  column "y" was set to null due to data type incompatibility at
line 5
NOTICE:  erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
  x   |  y   |  z   
------+------+------
    1 | NULL | NULL
    2 |    1 | NULL
    3 |    2 | NULL
    4 | NULL | NULL
 NULL | NULL | NULL
(5 rows)

I would still leave the extra messages from "log_verbosity verbose" as
NOTICE though. What do you think?

====

2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error stop, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
ERROR:  invalid input syntax for type integer: "a"
CONTEXT:  COPY t, line 1, column y: "a"
postgres=# SELECT * FROM t;
 x | y | z
---+---+---
(0 rows)

====

3) same as in 1)

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE:  skipping row due to data type incompatibility at line 4 for
column "y": "b"
NOTICE:  skipping row due to data type incompatibility at line 5 for
column "x": "a"
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres=# SELECT * FROM t;
 x | y |  z   
---+---+------
 2 | 1 | NULL
 3 | 2 | NULL
(2 rows)====

====

"on_error ignore" works well with "reject_limit #"

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose, reject_limit 1);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE:  skipping row due to data type incompatibility at line 4 for
column "y": "b"
ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type
incompatibility
CONTEXT:  COPY t, line 4, column y: "b"
postgres=# SELECT * FROM t;
 x | y | z
---+---+---
(0 rows)

best regards, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-03-11 10:32:02 Re: Accessing an invalid pointer in BufferManagerRelation structure
Previous Message vignesh C 2025-03-11 10:30:45 Re: Add an option to skip loading missing publication to avoid logical replication failure