Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: 2025-04-08 10:53:25
Message-ID: CACJufxGEHmijmP-QqvrmqU6cxmhgpdjY7ewQBQ=E9NmdyEcqmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>
> BTW have you measured the overheads of calling InputFunctionCallSafe
> twice? If it's significant, we might want to find other ways to
> achieve it as it would not be good to incur overhead just for
> relatively rare cases.
>

Please check the attached two patches
v17-0001-COPY-on_error-set_null.original,
v17-0001-COPY-on_error-set_null.patch

for non-domain types, (on_error set_null), the performance of these
two are the same.
for domain type with or without constraint,
(on_error set_null): v17.original is slower than v17.patch.

test script:

create unlogged table t2(a text);
insert into t2 select 'a' from generate_Series(1, 10_000_000) g;
copy t2 to '/tmp/2.txt';
CREATE DOMAIN d1 AS INT ;
CREATE DOMAIN d2 AS INT check (value > 0);
create unlogged table t3(a int);
create unlogged table t4(a d1);
create unlogged table t5(a d2);

performance result:
v17-0001-COPY-on_error-set_null.patch
-- 764.903 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 779.253 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- Time: 750.390 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1

v17-0001-COPY-on_error-set_null.original
-- 774.943 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 867.671 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 927.685 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1

> Here are some comments:
>
> + if (InputFunctionCallSafe(&in_functions[m],
> + NULL,
> + typioparams[m],
> + att->atttypmod,
> + NULL,
> + &values[m]))
>
> Given that we pass NULL to escontext, does this function return false
> in an error case? Or can we use InputFunctionCall instead?
>
> I think we should mention that SET_NULL still could fail if the data
> type of the column doesn't accept NULL.
>
> How about restructuring the codes around handling data incompatibility
> errors like:
>
> else if (!InputFunctionCallSafe(...))
> {
> if (cstate->opts.on_error == IGNORE)
> {
> cstate->num_errors++;
> if (cstate->opts.log_verbosity == VERBOSE)
> write a NOTICE message;
> return true; // ignore whole row.
> }
> else if (cstate->opts.on_error == SET_NULL)
> {
> current_row_erroneous = true;
> set NULL to the column;
> if (cstate->opts.log_verbosity == VERBOSE)
> write a NOTICE message;
> continue; // go to the next column.
> }
>
> That way, we have similar structures for both on_error handling and
> don't need to reset cstate->cur_attname at the end of SET_NULL
> handling.
>

I think we still need to reset cstate->cur_attname.
the current code structure is
``
foreach(cur, cstate->attnumlist)
{
if (condition x)
continue;
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
``
In some cases (last column , condition x is satisfied), once we reach
the ``continue``, then we cannot reach.
``
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
``

> ---
> From the regression tests:
>
> --fail, column a is domain with not-null constraint
> COPY t_on_error_null FROM STDIN WITH (on_error set_null);
> a 11 14
> \.
> ERROR: domain d_int_not_null does not allow null values
> CONTEXT: COPY t_on_error_null, line 1, column a: "a"
>
> I guess that the log messages could confuse users since while the
> actual error was caused by setting NULL to the non-NULL domain type
> column, the context message says the data 'a' was erroneous.
>

if the second function is InputFunctionCall, then we cannot customize
the error message.
we can't have both.
I guess we need a second InputFunctionCallSafe with escontext NOT NULL.

now i change it to
if (!cstate->domain_with_constraint[m] ||
InputFunctionCallSafe(&in_functions[m],
NULL,
typioparams[m],
att->atttypmod,
(Node *) cstate->escontext,
&values[m]))
else if (string == NULL)
ereport(ERROR,
errcode(ERRCODE_NOT_NULL_VIOLATION),
errmsg("domain %s does not allow null
values", format_type_be(typioparams[m])),
errdatatype(typioparams[m]));
else
ereport(ERROR,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input value for domain %s: \"%s\"",
format_type_be(typioparams[m]), string));

do these ``ELSE IF``, ``ELSE`` error report messages make sense to you?

Attachment Content-Type Size
v17-0001-COPY-on_error-set_null.original application/octet-stream 22.3 KB
v17-0001-COPY-on_error-set_null.patch text/x-patch 23.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-04-08 11:14:02 Re: Large expressions in indexes can't be stored (non-TOASTable)
Previous Message Tomas Vondra 2025-04-08 10:46:16 Re: Draft for basic NUMA observability