Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Svante Richter <pgsql-bugs(at)richter(dot)id>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
Date: 2022-08-16 15:26:47
Message-ID: Yvu3N9Dn4iA5+8Up@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > The psql documentation says [\copy ... from stdin] looks for "\.". It says
> > nothing about doing that for [\copy ... from filename]. I wonder if psql
> > should change the filename case to send the whole file to the server, ignoring
> > "\." inside. (That is to say, change to match the psql documentation.)
>
> This seems like a sensible solution. The need to use an in-band EOF
> marker when reading from the command source file is clear, and there's
> no non-kluge way there. But that doesn't mean we should extend it
> to separate files. (I'm surprised to realize we do, actually.)

So, look what I found in psql/copy.c:

/* check for EOF marker, but not on a partial line */
if (at_line_begin)
{
/*
--> * This code erroneously assumes '\.' on a line alone
--> * inside a quoted CSV string terminates the \copy.
--> * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
*/
if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
(linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
{
copydone = true;
}
}

I wondered who would reference an email thread in the source code, so I
looked at the whole 2012 thread:

https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org

and it was, of course, me. :-(

Tom's analysis was similar ten years ago:

> Ugh. This seems like a rather fundamental oversight in the CSV feature.
> The problem is that psql has no idea whether the copy is being done in
> CSV mode or not --- and even if it did, it doesn't parse the data fully
> enough to realize whether a \. line is inside quotes or not.
>
> In the case of out-of-line data files, it might be reasonable to just
> dispense with the check for \. altogether and always ship the whole file
> to the backend; I think there's a \. check on the backend side. (Not
> sure this is safe in V2 protocol, but I doubt anyone cares anymore
> about that.)
>
> In the case of in-line data in a script file, CSV mode seems a bit
> broken in any case; there's no concept of a terminator in CSV, AFAIK.
> So maybe we don't have to worry about that.

I think at this point we should either fix this or document it.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Francisco Olarte 2022-08-16 16:08:14 Re: [PATCH] Fix segfault calling PQflush on invalid connection
Previous Message Amit Kapila 2022-08-16 11:36:45 Re: Excessive number of replication slots for 12->14 logical replication