Re: Add new error_action COPY ON_ERROR "log"

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian(dot)universality(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add new error_action COPY ON_ERROR "log"
Date: 2024-03-03 23:30:00
Message-ID: CALj2ACXNA0focNeriYRvQQaCGc4CsTuOnFbzF9LqTKNWxuJdhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 1, 2024 at 10:22 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> > Nice catch. When COPY_ON_ERROR_STOP is specified, we use ereport's
> > soft error mechanism. An assertion seems a good choice to validate the
> > state is what we expect. Done that way.
>
> Hmm. I am not really on board with this patch, that would generate
> one NOTICE message each time a row is incompatible in the soft error
> mode. If you have a couple of billion rows to bulk-load into the
> backend and even 0.01% of them are corrupted, you could finish with a
> more than 100k log entries, and all systems should be careful about
> the log quantity generated, especially if we use the syslogger which
> could become easily a bottleneck.

Hm. I was having some concerns about it as mentioned upthread. But,
thanks a lot for illustrating it.

> The existing ON_ERROR controls what to do on error. I think that we'd
> better control the amount of information reported with a completely
> separate option, an option even different than where to redirect
> errors (if required, which would be either the logs, the client, a
> pipe, a combination of these or even all of them).

How about an extra option to error_action ignore-with-verbose which is
similar to ignore but when specified emits one NOTICE per malformed
row? With this, one can say COPY x FROM stdin (ON_ERROR
ignore-with-verbose);.

Alternatively, we can think of adding a new option verbose altogether
which can be used for not only this but for reporting some other COPY
related info/errors etc. With this, one can say COPY x FROM stdin
(VERBOSE on, ON_ERROR ignore);.

There's also another way of having a separate GUC, but -100 from me
for it. Because, it not only increases the total number of GUCs by 1,
but also might set a wrong precedent to have a new GUC for controlling
command level outputs.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-03-03 23:46:17 Re: Improve readability by using designated initializers when possible
Previous Message Nathan Bossart 2024-03-03 21:44:34 Re: Introduce XID age and inactive timeout based replication slot invalidation