| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
| Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Add tuples_skipped to pg_stat_progress_copy |
| Date: | 2024-01-24 08:05:29 |
| Message-ID: | CAD21AoAxavoPmOZLQOvGm+T8x+ht4FbOL_EV4kUJR=+PK9A7kg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jan 23, 2024 at 1:02 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-01-17 14:47, Masahiko Sawada wrote:
> > On Wed, Jan 17, 2024 at 2:22 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> > wrote:
> >>
> >> Hi,
> >>
> >> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
> >> skip malformed data, but there is no way to watch the number of
> >> skipped
> >> rows during COPY.
> >>
> >> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
> >> counts the number of skipped tuples because source data is malformed.
> >> If SAVE_ERROR_TO is not specified, this column remains zero.
> >>
> >> The advantage would be that users can quickly notice and stop COPYing
> >> when there is a larger amount of skipped data than expected, for
> >> example.
> >>
> >> As described in commit log, it is expected to add more choices for
> >> SAVE_ERROR_TO like 'log' and using such options may enable us to know
> >> the number of skipped tuples during COPY, but exposed in
> >> pg_stat_progress_copy would be easier to monitor.
> >>
> >>
> >> What do you think?
> >
> > +1
> >
> > The patch is pretty simple. Here is a comment:
> >
> > + (if <literal>SAVE_ERROR_TO</literal> is specified, otherwise
> > zero).
> > + </para></entry>
> > + </row>
> >
> > To be precise, this counter only advances when a value other than
> > 'ERROR' is specified to SAVE_ERROR_TO option.
>
> Thanks for your comment and review!
>
> Updated the patch according to your comment and option name change by
> b725b7eec.
Thanks! The patch looks good to me. I'm going to push it tomorrow,
barring any objections.
>
>
> BTW, based on this patch, I think we can add another option which
> specifies the maximum tolerable number of malformed rows.
> I remember this was discussed in [1], and feel it would be useful when
> loading 'dirty' data but there is a limit to how dirty it can be.
> Attached 0002 is WIP patch for this(I haven't added doc yet).
Yeah, it could be a good option.
> This may be better discussed in another thread, but any comments(e.g.
> necessity of this option, option name) are welcome.
I'd recommend forking a new thread for this option. As far as I
remember, there also was an opinion that "reject limit" stuff is not
very useful.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2024-01-24 08:11:49 | Re: Make COPY format extendable: Extract COPY TO format implementations |
| Previous Message | Ashutosh Bapat | 2024-01-24 07:02:30 | Re: partitioning and identity column |