Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
Date: 2024-11-11 10:45:27
Message-ID: 327d8dac-2321-4147-909f-6f18474ff4e1@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 10, 2024, at 23:14, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
>> On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote:
>>> That seems like a hack, as it also changes the behavior w.r.t.
>>> prompts and EOF-mark detection, neither for the better.
>
>> Hmm, in what way does it change the behavior?
>> I ran almost all the tests, including the TAP ones, and none of them failed with
>> this fix. Is there some behavior that is currently not covered by the test suite?
>
> Of course. (Our regression tests are very far from covering 100% of
> the behavioral space.)
>
> In the case at hand, this would break the
> expected line-by-line prompting for "\copy ... from '/dev/tty'".

Yes, of course, just wondered what kind of behavior in this area
that wasn't tested, thanks for explaining it.

> Playing with this just now, I notice that the prompt you get
> still claims that \. works to end input, although right now
> it does not:
>
> regression=# \copy int8_tbl from '/dev/tty'
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
>>> 1 3
>>> 5 6
>>> \.
>>> ^D
> COPY 2
>
> I'm inclined to think that the prompt still describes what should
> happen, at least in non-binary mode, although in binary mode we
> probably ought to just say (and do) "End with an EOF signal".
>
> So perhaps the if-test to choose the code path could be
>
> if ((isbinary || copystream != pset.cur_cmd_source) && !showprompt)

Thanks for guidance, that seems to fix, for 6/8 cases I've figured out how to test.

> which would allow dropping the vestigial prompt logic in the first
> path,

I assume you mean that due to "&& !showprompt" the "if (showprompt)"
becomes unreachable and can therefore be dropped?

> and we would also need to change the test in the second path
> that decides if we should check for \. (Likely this should be
> refactored a bit to make it more understandable. An intermediate
> flag saying whether we intend to check for \. might help.)

Maybe check_dot_command?

const bool check_dot_command = (copystream == pset.cur_cmd_source);

I haven't tried yet to refactor the code,
except than replacing the two "copystream == pset.cur_cmd_source"
occurrences with the new check_dot_command flag.

First want to understand if the two remaining cases are valid,
and if they can be tested:

> Anyway, my point is that we need to think through the desirable
> behavior for each possible combination of showprompt, isbinary, and
> copystream != pset.cur_cmd_source, because all 8 cases are reachable.

I guess these are the 8 cases?

+--------+-------------+----------+------------------+
| CASE | showprompt | isbinary | check_dot_command |
+--------+-------------+----------+------------------+
| 1 | false | false | false |
| 2 | false | false | true |
| 3 | false | true | false |
| 4 | false | true | true |
| 5 | true | false | false |
| 6 | true | false | true |
| 7* | true | true | false |
| 8* | true | true | true |
+--------+-------------+----------+------------------+
* Cases 7 and 8 not tested yet

With the changed if-test, case 1-6 works,
and for case 1, then binary mode branch is taken
instead of the text mode branch,
whereas cases 2-6 take the same branch as before.

joel(at)Joels-MBP psql_tester % git diff --no-index -U100 /tmp/psql.log.HEAD /tmp/psql.log
diff --git a/tmp/psql.log.HEAD b/tmp/psql.log
index 5e44e30..1f48ac9 100644
--- a/tmp/psql.log.HEAD
+++ b/tmp/psql.log
@@ -1,6 +1,6 @@
-COPY case: 1 TEXT MODE
+COPY case: 1 BINARY MODE
COPY case: 2 TEXT MODE
COPY case: 3 BINARY MODE
COPY case: 4 BINARY MODE
COPY case: 5 TEXT MODE
COPY case: 6 TEXT MODE

Here is how I tested each case:

# CASE 1:
# showprompt: false
# isbinary: false
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.data'"

# CASE 2:
# showprompt: false
# isbinary: false
# check_dot_command: true
psql -f /tmp/copy_stdin_text.sql

# CASE 3:
# showprompt: false
# isbinary: true
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.bin' (format binary)"

# CASE 4:
# showprompt: false
# isbinary: true
# check_dot_command: true
printf '\\copy int8_tbl from stdin (format binary)\n' >/tmp/copy_stdin_binary.sql
cat /tmp/int8_tbl.bin >>/tmp/copy_stdin_binary.sql
psql -f copy_stdin_binary.sql

# CASE 5:
# showprompt: true
# isbinary: false
# check_dot_command: false

psql
\copy int8_tbl from '/dev/tty'
17 18
19 20
\.
# Send EOF (Ctrl+D)

# CASE 6:
# showprompt: true
# isbinary: false
# check_dot_command: true

psql
\copy int8_tbl from stdin
21 22
23 24
\.

# CASE 7:
# showprompt: true
# isbinary: true
# check_dot_command: false

CASE 7 would be like CASE 5, with the addition of (format binary)
that is:

\copy int8_tbl from '/dev/tty' (format binary)

but I wonder if this is a valid case? Could we really copy/paste
or by some other means give psql binary data here?
I tried to wrap psql and send the binary content of
my '/tmp/int8_tbl.bin' file, and tried sending the EOF control code,
and I see I get PGCOPY-message, but the txn isn't committed,
so not sure what's happening. Before I continue trying
to figure this one out, I just wanted to make sure this is
a valid case, and how it is supposed to be used if so?

# CASE 8:
# showprompt: true
# isbinary: true
# check_dot_command: true

CASE 8 would be like CASE 6, with the addition of (format binary)
that is:

\copy int8_tbl from stdin (format binary)

I've had the same problem with this one, as with CASE 7.

Many thanks for guidance.

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-11-11 11:11:02 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Previous Message Ashutosh Bapat 2024-11-11 10:12:06 Re: Separate memory contexts for relcache and catcache