From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enhance file_fdw to report processed and skipped tuples in COPY progress |
Date: | 2024-10-11 06:36:45 |
Message-ID: | 20241011153645.a348de1576a3f57092c68355@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 11 Oct 2024 10:53:10 +0900
Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
> On 2024/10/04 2:12, Masahiko Sawada wrote:
> > Hi,
> >
> > On Thu, Oct 3, 2024 at 2:23 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>
> >> Hi,
> >>
> >> Currently, file_fdw updates several columns in the pg_stat_progress_copy view,
> >> like relid and bytes_processed, but it doesn't track tuples_processed or
> >> tuples_skipped. Monitoring these would be particularly useful when handling
> >> large data sets via file_fdw, as it helps track the progress of scan.
> >>
> >> The attached patch updates file_fdw to add support for reporting
> >> the number of tuples processed and skipped (due to on_error = 'ignore')
> >> in the pg_stat_progress_copy view. What are your thoughts?
> >
> > While the patch works fine and looks good to me, in the first place,
> > it seems to me that the fact that file_fdw uses the COPY progress
> > itself doesn't work properly. For example, unlike COPY command,
> > queries could have multiple scans on one or more flie_fdw foreign
> > tables when joining tables. I found the discussion for that[1]: there
> > was a proposal of disabling COPY progress for file_fdw but the votes
> > are split. I think it would be better to consider if we really want to
> > support COPY progress for file_fdw before supporting more progress
> > information.
>
> Yes, you're right. We need to address how to handle multiple commands
> that trigger progress reporting when executed concurrently, at first.
>
> The current progress reporting mechanism assumes only one command
> triggering progress is running at a time, as each backend has just
> one memory area for progress reporting. If multiple commands run simultaneously,
> the progress data would be incorrect. As you mentioned, this could happen
> when querying multiple file_fdw foreign tables, where multiple COPY commands
> could execute concurrently.
>
> However, this issue already exists without the proposed patch.
> Since file_fdw already reports progress partially, querying multiple
> file_fdw tables can lead to inaccurate or confusing progress reports.
> You can even observe this when analyzing a file_fdw table and also
> when copying to the table with a trigger that executes progress-reporting
> commands.
>
> So, I don’t think this issue should block the proposed patch.
> In fact, progress reporting is already flawed in these scenarios,
> regardless of whether the patch is applied.
>
> On the other hand, in many cases where a single file_fdw table is scanned,
> COPY progress reporting works correctly for file_fdw and is useful.
> Therefore, I believe it's still worth improving file_fdw’s progress reporting.
I think reporting tuples_processed and tuples_skipped columns additionally
in file_fdw is reasonable, since it already reports bytes_processed and bytes_total.
By the way, in the documentation of fild_fdw, it is not explicitly described
that file_fdw uses COPY internally, although I can find several wordings like "as COPY".
To prevent users to face unexpected experiences, how about explaining explicitly that
file_fdw uses COPY and updates pg_stat_progress_copy?
> To prevent misleading reports when multiple commands are run concurrently,
> just idea, we could consider displaying NULL columns in the progress report
> if this situation is detected, as a separate patch.
Or, could we add additional field to pg_stat_progress_copy to
show how much commands are running COPY? It is also just idea.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2024-10-11 06:44:40 | Re: On disable_cost |
Previous Message | Andrei Lepikhov | 2024-10-11 06:21:37 | Re: allowing extensions to control planner behavior |