Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Svante Richter <pgsql-bugs(at)richter(dot)id>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
Date: 2023-10-28 15:38:37
Message-ID: ZT0q_djqSO3d6MQU@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote:
> On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > The psql documentation says [\copy ... from stdin] looks for "\.". It says
> > > nothing about doing that for [\copy ... from filename]. I wonder if psql
> > > should change the filename case to send the whole file to the server, ignoring
> > > "\." inside. (That is to say, change to match the psql documentation.)
> >
> > This seems like a sensible solution. The need to use an in-band EOF
> > marker when reading from the command source file is clear, and there's
> > no non-kluge way there. But that doesn't mean we should extend it
> > to separate files. (I'm surprised to realize we do, actually.)
>
> So, look what I found in psql/copy.c:
>
> /* check for EOF marker, but not on a partial line */
> if (at_line_begin)
> {
> /*
> --> * This code erroneously assumes '\.' on a line alone
> --> * inside a quoted CSV string terminates the \copy.
> --> * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> */
> if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
> (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
> {
> copydone = true;
> }
> }
>
> I wondered who would reference an email thread in the source code, so I
> looked at the whole 2012 thread:
>
> https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org
>
> and it was, of course, me. :-(
>
> Tom's analysis was similar ten years ago:
>
> > Ugh. This seems like a rather fundamental oversight in the CSV feature.
> > The problem is that psql has no idea whether the copy is being done in
> > CSV mode or not --- and even if it did, it doesn't parse the data fully
> > enough to realize whether a \. line is inside quotes or not.
> >
> > In the case of out-of-line data files, it might be reasonable to just
> > dispense with the check for \. altogether and always ship the whole file
> > to the backend; I think there's a \. check on the backend side. (Not
> > sure this is safe in V2 protocol, but I doubt anyone cares anymore
> > about that.)
> >
> > In the case of in-line data in a script file, CSV mode seems a bit
> > broken in any case; there's no concept of a terminator in CSV, AFAIK.
> > So maybe we don't have to worry about that.
>
> I think at this point we should either fix this or document it.

I have decided to document this with the attached patch. I plan to
apply it to all supported Postgres versions. If we ever fix FROM file
so only FROM STDIN is a problem, we can update the documentation.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachment Content-Type Size
copy_csv.diff text/x-diff 1.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-10-28 18:15:08 Re: BUG #18172: High memory usage in tSRF function context
Previous Message Andrei Lepikhov 2023-10-28 11:00:55 Re: BUG #18170: Unexpected error: no relation entry for relid 3