| From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> | 
|---|---|
| To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> | 
| Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian(dot)universality(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Add new error_action COPY ON_ERROR "log" | 
| Date: | 2024-02-14 11:34:01 | 
| Message-ID: | 3bad6fd5570c5b63e0b68f79e20253f2@oss.nttdata.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 2024-02-12 15:46, Bharath Rupireddy wrote:
Thanks for your comments.
> On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> 
> wrote:
>> 
>> On 2024-01-27 00:43, David G. Johnston wrote:
>> 
>> >> I'd like to have a new option "log", which skips soft errors and
>> >> logs
>> >> information that should have resulted in errors to PostgreSQL log.
>> >
>> > user-specified tables in the same database.  Setting up temporary
>> > tables or unlogged tables probably is going to be a more acceptable
>> > methodology than trying to get to the log files.
>> 
>> OTOH not everyone thinks saving table information is the best idea[2].
> 
> The added NOTICE message gives a summary of how many rows were
> skipped, but not the line numbers. It's hard for the users to find the
> malformed data, especially when they are bulk-inserting from data
> files of multiple GBs in size (hard to open such files in any editor
> too).
> 
> I understand the value of writing the info to server log or tables of
> users' choice as being discussed in a couple of other threads.
> However, I'd prefer we do the simplest thing first before we go debate
> server log or table - let the users know what line numbers are
> containing the errors that COPY ignored something like [1] with a
> simple change like [2].
Agreed.
Unlike my patch, it hides the error information(i.e. 22P02: invalid 
input syntax for type integer: ), but I feel that it's usually 
sufficient to know the row number and column where the error occurred.
> It not only helps users figure out which rows
> and attributes were malformed, but also helps them redirect them to
> server logs with setting log_min_messages = notice [3]. In the worst
> case scenario, a problem with this one NOTICE per malformed row is
> that it can overload the psql session if all the rows are malformed.
> I'm not sure if this is a big problem, but IMO better than a single
> summary NOTICE message and simpler than writing to tables of users'
> choice.
Maybe could we do what you suggested for the behavior when 'log' is set 
to on_error?
> Thoughts?
> 
> FWIW, I presented the new COPY ... ON_ERROR option feature at
> Hyderabad PostgreSQL User Group meetup and it was well-received by the
> audience. The audience felt it's going to be extremely helpful for
> bulk-loading tasks. Thank you all for working on this feature.
Thanks for informing it!
I hope it will not be reverted as the audience:)
> [1]
> postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
> CREATE TABLE
> postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF 
> signal.
>>> 1    {1}     1
> a       {2}     2
> 3       {3}     3333333333
> 4       {a, 4}  4
> 
> 5       {5}>> >> >> >> >>       5
> \.>>
> NOTICE:  detected data type incompatibility at line number 2 for
> column check_ign_err, COPY n
> NOTICE:  detected data type incompatibility at line number 3 for
> column check_ign_err, COPY k
> NOTICE:  detected data type incompatibility at line number 4 for
> column check_ign_err, COPY m
> NOTICE:  detected data type incompatibility at line number 5 for
> column check_ign_err, COPY n
> NOTICE:  4 rows were skipped due to data type incompatibility
> COPY 2
> 
> [2]
> diff --git a/src/backend/commands/copyfromparse.c
> b/src/backend/commands/copyfromparse.c
> index 906756362e..93ab5d4ebb 100644
> --- a/src/backend/commands/copyfromparse.c
> +++ b/src/backend/commands/copyfromparse.c
> @@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
> *econtext,
> 
>                  &values[m]))
>                         {
>                                 cstate->num_errors++;
> -                               return true;
> +
> +                               if (cstate->opts.on_error != 
> COPY_ON_ERROR_STOP)
> +                               {
> +                                       ereport(NOTICE,
> +
> errmsg("detected data type incompatibility at line number %llu for
> column %s, COPY %s",
> +
> (unsigned long long) cstate->cur_lineno,
> +
> cstate->cur_relname,
> +
> cstate->cur_attname));
> +                                       return true;
> +                               }
>                         }
> 
>                         cstate->cur_attname = NULL;
> 
> [3]
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 2 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 2, column n: "a"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 3 for column check_ign_err, COPY k
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 3, column k: "3333333333"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 4 for column check_ign_err, COPY m
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 4, column m: "{a, 4}"
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
> incompatibility at line number 5 for column check_ign_err, COPY n
> 2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
> line 5, column n: ""
> 2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
> to data type incompatibility
> 
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
-- 
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Jones | 2024-02-14 11:36:49 | Re: Psql meta-command conninfo+ | 
| Previous Message | Amit Kapila | 2024-02-14 11:13:32 | Re: About a recently-added message |