| From: | "Joel Jacobson" <joel(at)compiler(dot)org> | 
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes | 
| Date: | 2024-11-11 10:45:27 | 
| Message-ID: | 327d8dac-2321-4147-909f-6f18474ff4e1@app.fastmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sun, Nov 10, 2024, at 23:14, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
>> On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote:
>>> That seems like a hack, as it also changes the behavior w.r.t.
>>> prompts and EOF-mark detection, neither for the better.
>
>> Hmm, in what way does it change the behavior?
>> I ran almost all the tests, including the TAP ones, and none of them failed with
>> this fix. Is there some behavior that is currently not covered by the test suite?
>
> Of course.  (Our regression tests are very far from covering 100% of
> the behavioral space.)
>
> In the case at hand, this would break the
> expected line-by-line prompting for "\copy ... from '/dev/tty'".
Yes, of course, just wondered what kind of behavior in this area
that wasn't tested, thanks for explaining it.
> Playing with this just now, I notice that the prompt you get
> still claims that \. works to end input, although right now
> it does not:
>
> regression=# \copy int8_tbl from '/dev/tty'
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
>>> 1    3
>>> 5    6
>>> \.
>>> ^D
> COPY 2
>
> I'm inclined to think that the prompt still describes what should
> happen, at least in non-binary mode, although in binary mode we
> probably ought to just say (and do) "End with an EOF signal".
>
> So perhaps the if-test to choose the code path could be
>
> 	if ((isbinary || copystream != pset.cur_cmd_source) && !showprompt)
Thanks for guidance, that seems to fix, for 6/8 cases I've figured out how to test.
> which would allow dropping the vestigial prompt logic in the first
> path,
I assume you mean that due to "&& !showprompt" the "if (showprompt)"
becomes unreachable and can therefore be dropped?
> and we would also need to change the test in the second path
> that decides if we should check for \.  (Likely this should be
> refactored a bit to make it more understandable.  An intermediate
> flag saying whether we intend to check for \. might help.)
Maybe check_dot_command?
const bool check_dot_command = (copystream == pset.cur_cmd_source);
I haven't tried yet to refactor the code,
except than replacing the two "copystream == pset.cur_cmd_source"
occurrences with the new check_dot_command flag.
First want to understand if the two remaining cases are valid,
and if they can be tested:
> Anyway, my point is that we need to think through the desirable
> behavior for each possible combination of showprompt, isbinary, and
> copystream != pset.cur_cmd_source, because all 8 cases are reachable.
I guess these are the 8 cases?
+--------+-------------+----------+------------------+
| CASE   | showprompt  | isbinary | check_dot_command |
+--------+-------------+----------+------------------+
|   1    |    false    |   false  |      false      |
|   2    |    false    |   false  |      true       |
|   3    |    false    |   true   |      false      |
|   4    |    false    |   true   |      true       |
|   5    |    true     |   false  |      false      |
|   6    |    true     |   false  |      true       |
|   7*   |    true     |   true   |      false      |
|   8*   |    true     |   true   |      true       |
+--------+-------------+----------+------------------+
* Cases 7 and 8 not tested yet
With the changed if-test, case 1-6 works,
and for case 1, then binary mode branch is taken
instead of the text mode branch,
whereas cases 2-6 take the same branch as before.
joel(at)Joels-MBP psql_tester % git diff --no-index -U100 /tmp/psql.log.HEAD /tmp/psql.log
diff --git a/tmp/psql.log.HEAD b/tmp/psql.log
index 5e44e30..1f48ac9 100644
--- a/tmp/psql.log.HEAD
+++ b/tmp/psql.log
@@ -1,6 +1,6 @@
-COPY case: 1 TEXT MODE
+COPY case: 1 BINARY MODE
 COPY case: 2 TEXT MODE
 COPY case: 3 BINARY MODE
 COPY case: 4 BINARY MODE
 COPY case: 5 TEXT MODE
 COPY case: 6 TEXT MODE
Here is how I tested each case:
# CASE 1:
# showprompt:        false
# isbinary:          false
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.data'"
# CASE 2:
# showprompt:        false
# isbinary:          false
# check_dot_command: true
psql -f /tmp/copy_stdin_text.sql
# CASE 3:
# showprompt:        false
# isbinary:          true
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.bin' (format binary)"
# CASE 4:
# showprompt:        false
# isbinary:          true
# check_dot_command: true
printf '\\copy int8_tbl from stdin (format binary)\n' >/tmp/copy_stdin_binary.sql
cat /tmp/int8_tbl.bin >>/tmp/copy_stdin_binary.sql
psql -f copy_stdin_binary.sql
# CASE 5:
# showprompt:        true
# isbinary:          false
# check_dot_command: false
psql
\copy int8_tbl from '/dev/tty'
17	18
19	20
\.
# Send EOF (Ctrl+D)
# CASE 6:
# showprompt:        true
# isbinary:          false
# check_dot_command: true
psql
\copy int8_tbl from stdin
21	22
23	24
\.
# CASE 7:
# showprompt:        true
# isbinary:          true
# check_dot_command: false
CASE 7 would be like CASE 5, with the addition of (format binary)
that is:
\copy int8_tbl from '/dev/tty' (format binary)
but I wonder if this is a valid case? Could we really copy/paste
or by some other means give psql binary data here?
I tried to wrap psql and send the binary content of
my '/tmp/int8_tbl.bin' file, and tried sending the EOF control code,
and I see I get PGCOPY-message, but the txn isn't committed,
so not sure what's happening. Before I continue trying
to figure this one out, I just wanted to make sure this is
a valid case, and how it is supposed to be used if so?
# CASE 8:
# showprompt:        true
# isbinary:          true
# check_dot_command: true
CASE 8 would be like CASE 6, with the addition of (format binary)
that is:
\copy int8_tbl from stdin (format binary)
I've had the same problem with this one, as with CASE 7.
Many thanks for guidance.
/Joel
| From | Date | Subject | |
|---|---|---|---|
| Next Message | torikoshia | 2024-11-11 11:11:02 | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row | 
| Previous Message | Ashutosh Bapat | 2024-11-11 10:12:06 | Re: Separate memory contexts for relcache and catcache |