Quick Links

Re: Add tuples_skipped to pg_stat_progress_copy

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Add tuples_skipped to pg_stat_progress_copy
Date:	2024-01-24 08:05:29
Message-ID:	CAD21AoAxavoPmOZLQOvGm+T8x+ht4FbOL_EV4kUJR=+PK9A7kg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 23, 2024 at 1:02 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-01-17 14:47, Masahiko Sawada wrote:
> > On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> > wrote:
> >>
> >> Hi,
> >>
> >> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
> >> skip malformed data, but there is no way to watch the number of
> >> skipped
> >> rows during COPY.
> >>
> >> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
> >> counts the number of skipped tuples because source data is malformed.
> >> If SAVE_ERROR_TO is not specified, this column remains zero.
> >>
> >> The advantage would be that users can quickly notice and stop COPYing
> >> when there is a larger amount of skipped data than expected, for
> >> example.
> >>
> >> As described in commit log, it is expected to add more choices for
> >> SAVE_ERROR_TO like 'log' and using such options may enable us to know
> >> the number of skipped tuples during COPY, but exposed in
> >> pg_stat_progress_copy would be easier to monitor.
> >>
> >>
> >> What do you think?
> >
> > +1
> >
> > The patch is pretty simple. Here is a comment:
> >
> > + (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise
> > zero).
> > + </para></entry>
> > + </row>
> >
> > To be precise, this counter only advances when a value other than
> > 'ERROR' is specified to SAVE_ERROR_TO option.
>
> Thanks for your comment and review!
>
> Updated the patch according to your comment and option name change by
> b725b7eec.

Thanks! The patch looks good to me. I'm going to push it tomorrow,
barring any objections.

>
>
> BTW, based on this patch, I think we can add another option which
> specifies the maximum tolerable number of malformed rows.
> I remember this was discussed in [1], and feel it would be useful when
> loading 'dirty' data but there is a limit to how dirty it can be.
> Attached 0002 is WIP patch for this(I haven't added doc yet).

Yeah, it could be a good option.

> This may be better discussed in another thread, but any comments(e.g.
> necessity of this option, option name) are welcome.

I'd recommend forking a new thread for this option. As far as I
remember, there also was an opinion that "reject limit" stuff is not
very useful.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: Add tuples_skipped to pg_stat_progress_copy at 2024-01-22 16:02:15 from torikoshia

Responses

Re: Add tuples_skipped to pg_stat_progress_copy at 2024-01-25 02:25:33 from torikoshia

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2024-01-24 08:11:49	Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message	Ashutosh Bapat	2024-01-24 07:02:30	Re: partitioning and identity column