Re: Add reject_limit option to file_fdw

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add reject_limit option to file_fdw
Date: 2024-11-19 12:40:30
Message-ID: a7d70e35229e2075b371cd1e81222244@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-11-13 23:17, Yugo NAGATA wrote:
> On Wed, 13 Nov 2024 21:48:10 +0900
> torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
>> On 2024-11-12 14:51, Yugo Nagata wrote:
>>
>> Thanks for your review!
>>
>> > On Tue, 12 Nov 2024 10:16:50 +0900
>> > torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>> >
>> >> On 2024-11-12 01:49, Fujii Masao wrote:
>> >> > On 2024/11/11 21:45, torikoshia wrote:
>> >> >>> Thanks for adding the comment. It clearly states that REJECT_LIMIT
>> >> >>> can be
>> >> >>> a single-quoted string. However, it might also be helpful to mention
>> >> >>> that
>> >> >>> it can be provided as an int64 in the COPY command option. How about
>> >> >>> updating it like this?
>> >> >>>
>> >> >>> ------------------------------------
>> >> >>> REJECT_LIMIT can be specified in two ways: as an int64 for the COPY
>> >> >>> command
>> >> >>> option or as a single-quoted string for the foreign table option
>> >> >>> using
>> >> >>> file_fdw. Therefore this function needs to handle both formats.
>> >> >>> ------------------------------------
>> >> >>
>> >> >> Thanks! it seems better.
>> >> >>
>> >> >>
>> >> >> Attached v3 patch.
>> >> >
>> >> > Thanks for updating the patch! It looks like you forgot to attach it,
>> >> > though.
>> >>
>> >> Oops, thanks for pointing it out.
>> >> Here it is.
>> >
>> > I have one minor comment.
>> >
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>> > + errmsg("skipped more than REJECT_LIMIT (%lld) rows due to data
>> > type incompatibility",
>> > + (long long) cstate->opts.reject_limit)));
>> >
>> > Do we have to keep this message consistent with ones of COPY command?
>> > With foreign data wrappers, it seems common that the option name is
>> > passed in
>> > lowercase, so is it better to use "skipped more than reject_limit ..."
>> > rather
>> > than using uppercase?
>>
>> Agreed.
>> Attached v4 patch.
>
>
> Thank you for updating the patch.
>
> However, I noticed that file_fdw could output the following error
> using uppercase
> when an invalid value is set to the reject_limit option,
>
> ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '-1');
> ERROR: REJECT_LIMIT (-1) must be greater than zero
>
> as well as if reject_limit is set when on_error != 'ignore'.
>
> ALTER FOREIGN TABLE agg_bad OPTIONS (ADD reject_limit '1'); -- ERROR
> ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
>
> Also, in the regression tests, I found similar behaviours on existing
> options,
> for example;
>
> CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
> 'xml'); -- ERROR
> ERROR: COPY format "xml" not recognized
> CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
> 'text', quote ':'); -- ERROR
> ERROR: COPY QUOTE requires CSV mode
> CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format
> 'text', escape ':'); -- ERROR
> ...
>
> These messages may be unexpected for some users because the
> documentation of
> fild_fdw does not explicitly describe that file_fdw uses COPY
> internally.
> (I can find several wordings like "as COPY", though.)
> However, since the current file_fdw already has such messages, I came
> to think
> making the messages on "reject_limit" consistent with COPY is
> reasonable.
> I mean, the previous version of the message seems fine.

Agreed. Thanks for your investigation.

> As another comment, do we need to add validator test for on_error and
> reject_limit as similar to other options?

That might be better.
Added some queries which had wrong options to cause errors.

I was unsure whether to add it to "validator test" section or "on_error,
log_verbosity and reject_limit tests" section, but I chose "validator
test" as I believe it makes things clearer.

Added 0002 patch since some of the tests are not related to reject_limit
but just on_error.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment Content-Type Size
v5-0001-file_fdw-Add-reject_limit-option-to-file_fdw.patch text/x-diff 7.1 KB
v5-0002-Add-validator-tests-for-on_error-options.patch text/x-diff 2.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-11-19 12:50:50 Re: Statistics Import and Export
Previous Message Ashutosh Bapat 2024-11-19 12:19:15 Re: meson and check-tests