Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Michael Paquier" <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL
Date: 2024-10-18 06:27:44
Message-ID: f2558d62-3be6-4aa2-94d5-94823f892bd1@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 18, 2024, at 00:52, Michael Paquier wrote:
> On Thu, Oct 17, 2024 at 10:21:54AM +0200, Joel Jacobson wrote:
>> I believe the correction should be to use COPY TO stdout instead of
>> COPY FROM stdin, since FORCE_QUOTE is only applicable with COPY TO.
>> This way, the test correctly verifies the disallowed combination of FORCE_QUOTE
>> with a non-CSV format in COPY TO.
>>
>> I see how this is easy to miss, since the tests for the three FORCE_* options
>> look very similar, but they are different in that it's only FORCE_QUOTE that
>> is allowed for COPY TO, whereas the two other FORCE_NOT_NULL
>> and FORCE_NULL are only allowed for COPY FROM.
>
> This one is intentional. I was looking at the full picture of these
> queries yesterday, and decided to maximize the number of COPY FROM for
> all the queries that do not check option interactions that rely on TO
> or FROM to exist. Hence, the tests are now shaped so as COPY TO is
> only used where we expect it to be set, while the sequence is lossy
> only with COPY FROM.

Thank you for your response and for refining the tests.

I believe we should use COPY TO specifically when testing that the FORCE_QUOTE
option requires CSV mode. Using COPY TO isolates this validation without relying
on implementation details like the current order of checks in ProcessCopyOptions,
which seems unnecessary.

While it's beneficial to maximize the use of COPY FROM in tests where the
direction doesn't matter, in this case, COPY TO or COPY FROM does matter because
FORCE_QUOTE is only valid with COPY TO. This differs from options like QUOTE,
ESCAPE, ENCODING, and NULL, which are valid for both directions.

I think the effort to standardize overshot in this instance, as FORCE_QUOTE
is the _only_ option that is only valid for COPY TO.

Therefore, in the incorrect options tests, we should use COPY TO when testing
FORCE_QUOTE, except for the test that checks using FORCE_QUOTE with COPY FROM
is an error.

As the tests are currently written, the expected error messages are produced
only by coincidence, relying on the current order of checks:

/* Check force_quote */
if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
/* ERROR: COPY FORCE_QUOTE requires CSV mode */
if ((opts_out->force_quote || opts_out->force_quote_all) && is_from)
/* ERROR: COPY FORCE_QUOTE cannot be used with COPY FROM */

First, it checks for CSV mode,
then it checks if FORCE_QUOTE is used with COPY FROM.

Designing a test that combines two incorrect options, where only one error is
reported, seems unnecessary. If both errors were reported, we might eliminate
the need for separate tests, but that's not the case here.

I believe test specificity and clarity are important. Using COPY FROM in this
context is confusing and makes the tests harder to maintain, as testing two
things at once, deviates from the pattern established by other tests in
this section.

To improve test specificity and maintainability,
I suggest changing the tests as follows:

-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
COPY x from stdin (on_error unsupported);
-COPY x from stdin (format TEXT, force_quote(a));
-COPY x from stdin (format TEXT, force_quote *);
+COPY x to stdout (format TEXT, force_quote(a));
+COPY x to stdout (format TEXT, force_quote *);
COPY x from stdin (format CSV, force_quote(a));
COPY x from stdin (format CSV, force_quote *);
COPY x from stdin (format TEXT, force_not_null(a));
COPY x from stdin (format TEXT, force_not_null *);
COPY x to stdout (format CSV, force_not_null(a));
COPY x to stdout (format CSV, force_not_null *);
COPY x from stdin (format TEXT, force_null(a));
COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);

Please let me know your thoughts on this.

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2024-10-18 07:49:47 Re: Syncrep and improving latency due to WAL throttling
Previous Message Michael Paquier 2024-10-18 06:27:29 Re: Set query_id for query contained in utility statement