From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
Cc: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2023-05-19 06:19:30 |
Message-ID: | f594c28d-0da7-40d8-82f4-8e238ee3ccde@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 18, 2023, at 18:48, Daniel Verite wrote:
> Joel Jacobson wrote:
>> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
>> empty line...
>
> Note that this is the case for valid CSV contents, since backslash-dot
> on a line by itself is both an end-of-data marker for COPY FROM and a
> valid CSV line.
> Having this line in the data results in either an error or having the
> rest of the data silently discarded, depending on the context. There
> is some previous discussion about this in [1].
> Since the TEXT format doesn't have this kind of problem, one solution
> is to filter the data through PROGRAM with an [untrusted CSV]->TEXT
> filter. This is to be preferred over direct CSV loading when
> strictness or robustness are more important than convenience.
>
>
> [1]
> https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b977a@manitou-mail.org
Thanks for sharing the old thread, very useful.
I see I've failed miserably to understand all the details of the COPY command.
Upon reading the thread, I'm still puzzled about one thing:
Why does \. need to have a special meaning when using COPY FROM with files?
I understand its necessity for STDIN, given that the end of input needs to be
explicitly defined.
However, for files, we have a known file size and the end-of-file can be
detected without the need for special markers.
Also, is the difference in how server-side COPY CSV is capable of dealing
with \. but apparently not the client-side \COPY CSV documented somewhere?
CREATE TABLE t (c text);
INSERT INTO t (c) VALUES ('foo'), (E'\n\\.\n'), ('bar');
-- Works OK:
COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
COPY t FROM '/tmp/t.csv' WITH CSV;
-- Doesn't work:
\COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
\COPY t FROM '/tmp/t.csv' WITH CSV;
ERROR: unterminated CSV quoted field
CONTEXT: COPY t, line 4: ""
\.
"
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Marina Polyakova | 2023-05-19 06:39:39 | Re: Conflict between regression tests namespace & transactions due to recent changes |
Previous Message | Michael Paquier | 2023-05-19 06:03:29 | Re: Conflict between regression tests namespace & transactions due to recent changes |