Re: Add on_error and log_verbosity options to file_fdw

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add on_error and log_verbosity options to file_fdw
Date: 2024-10-02 00:27:06
Message-ID: CAD21AoCt4UTGs-V_sGewCdjvmjhgnnOUG3T83tRF=VtbdW-dPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Sep 30, 2024 at 8:36 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2024/09/26 21:57, torikoshia wrote:
> > Updated the patches.
>
> Thanks for updating the patches! I’ve made some changes based on your work, which are attached.
> Barring any objections, I'm thinking to push these patches.
>
> For patches 0001 and 0003, I ran pgindent and updated the commit message.
>
> Regarding patch 0002:
>
> - I updated the regression test to run ANALYZE on the file_fdw foreign table
> since the on_error option also affects the ANALYZE command. To ensure test
> stability, the test now runs ANALYZE with log_verbosity = 'silent'.
>
> - I removed the code that updated the count of skipped rows for
> the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t
> currently support tracking pg_stat_progress_copy.tuples_processed.
> Supporting only tuples_skipped seems inconsistent, so I suggest creating
> a separate patch to extend file_fdw to track both tuples_processed and
> tuples_skipped in this view.
>
> - I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan()
> by replacing it with a goto statement for improved readability.
>
> - I modified file_fdw to log a NOTICE message about skipped rows at the end of
> ANALYZE if any rows are skipped due to the on_error = 'ignore' setting.
>
> Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE
> command on the file_fdw foreign table, what number should be reflected in XXX,
> especially when some rows are skipped due to on_error = 'ignore'?
> Currently, the count only includes valid rows, excluding any skipped rows.
> I haven't modified this code yet. Should we instead count all rows
> (valid and erroneous) and report that total?
>
> I noticed the code for reporting the number of skipped rows due to
> on_error = 'ignore' appears in three places. I’m considering creating
> a common function for this reporting to eliminate redundancy but haven’t
> implemented it yet.
>
> - I’ve updated the commit message and run pgindent.

Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:

- found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values,
slot->tts_isnull);
- if (found)
+
+retry:
+ if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+ {
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occurred, skip this tuple and just make
+ * ErrorSaveContext ready for the next
NextCopyFrom. Since we
+ * don't set details_wanted and error_data is
not to be filled,
+ * just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ /* Repeat NextCopyFrom() until no soft error occurs */
+ goto retry;
+ }
+
ExecStoreVirtualTuple(slot);
+ }

I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.

Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takeshi Ideriha 2024-10-02 00:36:53 Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values
Previous Message Jim Jones 2024-10-01 23:08:43 Re: Truncate logs by max_log_size