From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
Date: | 2025-04-07 22:41:24 |
Message-ID: | CAD21AoBWwEzpp3Z6tp-O-AQGftK-kuj6vRva2L5daWoWBtbnRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Apr 5, 2025 at 1:31 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 4, 2025 at 4:55 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Mar 25, 2025 at 2:31 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > 2) Here in error we say column c1 violates not-null constraint and in
> > > > the context we show column c2, should the context also display c2
> > > > column:
> > > > postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> > > > CREATE TABLE
> > > > postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> > > > Enter data to be copied followed by a newline.
> > > > End with a backslash and a period on a line by itself, or an EOF signal.
> > > > >> a b
> > > > >> \.
> > > > ERROR: null value in column "c1" of relation "t3" violates not-null constraint
> > > > DETAIL: Failing row contains (null, null).
> > > > CONTEXT: COPY t3, line 1, column c2: "b"
> > > >
> > >
> > > It took me a while to figure out why.
> > > with the attached, now the error message becomes:
> > >
> > > ERROR: null value in column "c1" of relation "t3" violates not-null constraint
> > > DETAIL: Failing row contains (null, null).
> > > CONTEXT: COPY t3, line 1: "a,b"
> > >
> > > while at it,
> > > (on_error set_to_null, log_verbosity verbose)
> > > error message CONTEXT will only emit out relation name,
> > > this aligns with (on_error ignore, log_verbosity verbose).
> > >
> > > one of the message out example:
> > > +NOTICE: column "b" was set to null due to data type incompatibility at line 2
> > > +CONTEXT: COPY t_on_error_null
> > >
> > >
> > >
> > > > 3) typo becomen should be become:
> > > > null will becomen reserved to non-reserved
> > > fixed.
> > >
> > > > 4) There is a whitespace error while applying patch
> > > > Applying: COPY (on_error set_to_null)
> > > > .git/rebase-apply/patch:39: trailing whitespace.
> > > > a <literal>NOTICE</literal> message indicating the number of rows
> > > > warning: 1 line adds whitespace errors.
> > > fixed.
> >
> > I've reviewed the v15 patch and here are some comments:
> >
> > How about renaming the new option value to 'set_null"? The 'to' in the
> > value name seems redundant to me.
> >
> > ---
> > + COPY_ON_ERROR_NULL, /* set error field to null */
> >
> > I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
> > COPY_ON_ERROR_SET_NULL if we change the option value name) for
> > consistency with the value name.
> >
> > ---
> > + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
> > + ereport(NOTICE,
> > + errmsg_plural("invalid values
> > in %" PRIu64 " row was replaced with null",
> > +
> > "invalid values in %" PRIu64 " rows were replaced with null",
> > +
> > cstate->num_errors,
> > +
> > cstate->num_errors));
> >
> > How about adding "due to data type incompatibility" at the end of the message?
> >
> > ---
> > + ereport(NOTICE,
> > + errmsg("column
> > \"%s\" was set to null due to data type incompatibility at line %"
> > PRIu64 "",
> > +
> > cstate->cur_attname,
> > +
> > cstate->cur_lineno));
> >
> > Similar to the IGNORE case, we can show the data in question in the message.
> >
> > ---
> > + else
> > + ereport(ERROR,
> > +
> > errcode(ERRCODE_NOT_NULL_VIOLATION),
> > + errmsg("domain %s does
> > not allow null values", format_type_be(typioparams[m])),
> > + errdatatype(typioparams[m]));
> >
> > If domain data type is the sole case where not to accept NULL, can we
> > check it beforehand to avoid calling the second
> > InputFunctionCallSafe() for non-domain data types? Also, if we want to
> > end up with an error when setting NULL to a domain type with NOT NULL,
> > I think we don't need to try to handle a soft error by passing
> > econtext to InputFunctionCallSafe().
> >
>
> please check attached, hope i have addressed all the points you've mentioned.
>
>
> > If domain data type is the sole case where not to accept NULL, can we
> > check it beforehand to avoid calling the second
> > InputFunctionCallSafe() for non-domain data types?
>
> I doubt it.
>
> we have
> InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
> Oid typioparam, int32 typmod,
> fmNodePtr escontext,
> Datum *result)
> {
> LOCAL_FCINFO(fcinfo, 3);
> if (str == NULL && flinfo->fn_strict)
> {
> *result = (Datum) 0; /* just return null result */
> return true;
> }
> }
>
> Most of the non-domain type input functions are strict.
> see query result:
>
> select proname, pt.typname, proisstrict,pt.typtype
> from pg_type pt
> join pg_proc pp on pp.oid = pt.typinput
> where pt.typtype <> 'd'
> and pt.typtype <> 'p'
> and proisstrict is false;
>
> so the second InputFunctionCallSafe will be faster for non-domain types.
Agreed.
BTW have you measured the overheads of calling InputFunctionCallSafe
twice? If it's significant, we might want to find other ways to
achieve it as it would not be good to incur overhead just for
relatively rare cases.
Here are some comments:
+ if (InputFunctionCallSafe(&in_functions[m],
+ NULL,
+ typioparams[m],
+ att->atttypmod,
+ NULL,
+ &values[m]))
Given that we pass NULL to escontext, does this function return false
in an error case? Or can we use InputFunctionCall instead?
I think we should mention that SET_NULL still could fail if the data
type of the column doesn't accept NULL.
How about restructuring the codes around handling data incompatibility
errors like:
else if (!InputFunctionCallSafe(...))
{
if (cstate->opts.on_error == IGNORE)
{
cstate->num_errors++;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
return true; // ignore whole row.
}
else if (cstate->opts.on_error == SET_NULL)
{
current_row_erroneous = true;
set NULL to the column;
if (cstate->opts.log_verbosity == VERBOSE)
write a NOTICE message;
continue; // go to the next column.
}
That way, we have similar structures for both on_error handling and
don't need to reset cstate->cur_attname at the end of SET_NULL
handling.
---
From the regression tests:
--fail, column a is domain with not-null constraint
COPY t_on_error_null FROM STDIN WITH (on_error set_null);
a 11 14
\.
ERROR: domain d_int_not_null does not allow null values
CONTEXT: COPY t_on_error_null, line 1, column a: "a"
I guess that the log messages could confuse users since while the
actual error was caused by setting NULL to the non-NULL domain type
column, the context message says the data 'a' was erroneous.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-04-07 23:00:48 | Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits |
Previous Message | Bruce Momjian | 2025-04-07 22:37:49 | Re: libpq maligning postgres stability |