Re: Add new error_action COPY ON_ERROR "log"

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian(dot)universality(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add new error_action COPY ON_ERROR "log"
Date: 2024-02-14 11:34:01
Message-ID: 3bad6fd5570c5b63e0b68f79e20253f2@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-02-12 15:46, Bharath Rupireddy wrote:

Thanks for your comments.

> On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>>
>> On 2024-01-27 00:43, David G. Johnston wrote:
>>
>> >> I'd like to have a new option "log", which skips soft errors and
>> >> logs
>> >> information that should have resulted in errors to PostgreSQL log.
>> >
>> > user-specified tables in the same database. Setting up temporary
>> > tables or unlogged tables probably is going to be a more acceptable
>> > methodology than trying to get to the log files.
>>
>> OTOH not everyone thinks saving table information is the best idea[2].
>
> The added NOTICE message gives a summary of how many rows were
> skipped, but not the line numbers. It's hard for the users to find the
> malformed data, especially when they are bulk-inserting from data
> files of multiple GBs in size (hard to open such files in any editor
> too).
>
> I understand the value of writing the info to server log or tables of
> users' choice as being discussed in a couple of other threads.
> However, I'd prefer we do the simplest thing first before we go debate
> server log or table - let the users know what line numbers are
> containing the errors that COPY ignored something like [1] with a
> simple change like [2].

Agreed.
Unlike my patch, it hides the error information(i.e. 22P02: invalid
input syntax for type integer: ), but I feel that it's usually
sufficient to know the row number and column where the error occurred.

> It not only helps users figure out which rows
> and attributes were malformed, but also helps them redirect them to
> server logs with setting log_min_messages = notice [3]. In the worst
> case scenario, a problem with this one NOTICE per malformed row is
> that it can overload the psql session if all the rows are malformed.
> I'm not sure if this is a big problem, but IMO better than a single
> summary NOTICE message and simpler than writing to tables of users'
> choice.

Maybe could we do what you suggested for the behavior when 'log' is set
to on_error?

> Thoughts?
>
> FWIW, I presented the new COPY ... ON_ERROR option feature at
> Hyderabad PostgreSQL User Group meetup and it was well-received by the
> audience. The audience felt it's going to be extremely helpful for
> bulk-loading tasks. Thank you all for working on this feature.

Thanks for informing it!
I hope it will not be reverted as the audience:)

> [1]
> postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
> CREATE TABLE
> postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF
> signal.
>>> 1 {1} 1
> a {2} 2
> 3 {3} 3333333333
> 4 {a, 4} 4
>
> 5 {5}>> >> >> >> >> 5
> \.>>
> NOTICE: detected data type incompatibility at line number 2 for
> column check_ign_err, COPY n
> NOTICE: detected data type incompatibility at line number 3 for
> column check_ign_err, COPY k
> NOTICE: detected data type incompatibility at line number 4 for
> column check_ign_err, COPY m
> NOTICE: detected data type incompatibility at line number 5 for
> column check_ign_err, COPY n
> NOTICE: 4 rows were skipped due to data type incompatibility
> COPY 2
>
> [2]
> diff --git a/src/backend/commands/copyfromparse.c
> b/src/backend/commands/copyfromparse.c
> index 906756362e..93ab5d4ebb 100644
> --- a/src/backend/commands/copyfromparse.c
> +++ b/src/backend/commands/copyfromparse.c
> @@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext
> *econtext,
>
> &values[m]))
> {
> cstate->num_errors++;
> - return true;
> +
> + if (cstate->opts.on_error !=
> COPY_ON_ERROR_STOP)
> + {
> + ereport(NOTICE,
> +
> errmsg("detected data type incompatibility at line number %llu for
> column %s, COPY %s",
> +
> (unsigned long long) cstate->cur_lineno,
> +
> cstate->cur_relname,
> +
> cstate->cur_attname));
> + return true;
> + }
> }
>
> cstate->cur_attname = NULL;
>
> [3]
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
> incompatibility at line number 2 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
> line 2, column n: "a"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
> incompatibility at line number 3 for column check_ign_err, COPY k
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
> line 3, column k: "3333333333"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
> incompatibility at line number 4 for column check_ign_err, COPY m
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
> line 4, column m: "{a, 4}"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
> incompatibility at line number 5 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
> line 5, column n: ""
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE: 4 rows were skipped due
> to data type incompatibility
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2024-02-14 11:36:49 Re: Psql meta-command conninfo+
Previous Message Amit Kapila 2024-02-14 11:13:32 Re: About a recently-added message