Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-19 01:13:31
Message-ID: CAD21AoARNqwiE53uywdRkfc4jE-Asi_KA9abQjDw33ft2mMp8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 18, 2023 at 4:41 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Mon, Dec 18, 2023 at 1:09 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
> >
> > Hi,
> >
> > > save the error metadata to system catalogs would be more expensive,
> > > please see below explanation.
> > > I have no knowledge of publications.
> > > but i feel there is a feature request: publication FOR ALL TABLES
> > > exclude regex_pattern.
> > > Anyway, that would be another topic.
> >
> > I think saving error metadata to system catalog is not a good idea, too.
> > And I believe Sawada-san just pointed out missing features and did not
> > suggested that we use system catalog.
> >
> > > I don't think "specify the maximum number of errors to tolerate
> > > before raising an ERROR." is very useful....
> >
> > That may be so.
> > I imagine it's useful in some use case since some loading tools have
> > such options.
> > Anyway I agree it's not necessary for initial patch as mentioned in [1].
> >
> > > I suppose we can specify an ERRORFILE directory. similar
> > > implementation [2], demo in [3]
> > > it will generate 2 files, one file shows the malform line content as
> > > is, another file shows the error info.
> >
> > That may be a good option when considering "(2) logging errors to
> > somewhere".
> >
> > What do you think about the proposal to develop these features in
> > incrementally?
> >
>
> I am more with tom's idea [1], that is when errors happen (data type
> conversion only), do not fail, AND we save the error to a table. I
> guess we can implement this logic together, only with a new COPY
> option.

If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.

>
> imagine a case (it's not that contrived, imho), while conversion from
> text to table's int, postgres isspace is different from the source
> text file's isspace logic.
> then all the lines are malformed. If we just say on error continue and
> not save error meta info, the user is still confused which field has
> the wrong data, then the user will probably try to incrementally test
> which field contains malformed data.
>
> Since we need to save the error somewhere.
> Everyone has the privilege to INSERT can do COPY.
> I think we also need to handle the access privilege also.
> So like I mentioned above, one copy_error error table hub, then
> everyone can view/select their own copy failure record.

The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2023-12-19 01:25:27 Re: Transaction timeout
Previous Message Masahiko Sawada 2023-12-19 00:28:08 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)