Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2023-05-19 06:19:30
Message-ID: f594c28d-0da7-40d8-82f4-8e238ee3ccde@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 18, 2023, at 18:48, Daniel Verite wrote:
> Joel Jacobson wrote:
>> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
>> empty line...
>
> Note that this is the case for valid CSV contents, since backslash-dot
> on a line by itself is both an end-of-data marker for COPY FROM and a
> valid CSV line.
> Having this line in the data results in either an error or having the
> rest of the data silently discarded, depending on the context. There
> is some previous discussion about this in [1].
> Since the TEXT format doesn't have this kind of problem, one solution
> is to filter the data through PROGRAM with an [untrusted CSV]->TEXT
> filter. This is to be preferred over direct CSV loading when
> strictness or robustness are more important than convenience.
>
>
> [1]
> https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b977a@manitou-mail.org

Thanks for sharing the old thread, very useful.
I see I've failed miserably to understand all the details of the COPY command.

Upon reading the thread, I'm still puzzled about one thing:

Why does \. need to have a special meaning when using COPY FROM with files?

I understand its necessity for STDIN, given that the end of input needs to be
explicitly defined.
However, for files, we have a known file size and the end-of-file can be
detected without the need for special markers.

Also, is the difference in how server-side COPY CSV is capable of dealing
with \. but apparently not the client-side \COPY CSV documented somewhere?

CREATE TABLE t (c text);
INSERT INTO t (c) VALUES ('foo'), (E'\n\\.\n'), ('bar');

-- Works OK:
COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
COPY t FROM '/tmp/t.csv' WITH CSV;

-- Doesn't work:
\COPY t TO '/tmp/t.csv' WITH CSV;
TRUNCATE t;
\COPY t FROM '/tmp/t.csv' WITH CSV;
ERROR: unterminated CSV quoted field
CONTEXT: COPY t, line 4: ""
\.
"

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2023-05-19 06:39:39 Re: Conflict between regression tests namespace & transactions due to recent changes
Previous Message Michael Paquier 2023-05-19 06:03:29 Re: Conflict between regression tests namespace & transactions due to recent changes