From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Daniel Verite <daniel(at)manitou-mail(dot)org> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: backslash-dot quoting in COPY CSV |
Date: | 2019-01-28 21:47:25 |
Message-ID: | 20190128214725.GI26761@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 28, 2019 at 04:06:17PM +0100, Daniel Verite wrote:
> Michael Paquier wrote:
>
> > In src/bin/psql/copy.c, handleCopyIn():
> >
> > /*
> > * This code erroneously assumes '\.' on a line alone
> > * inside a quoted CSV string terminates the \copy.
> > *
> > http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> > */
> > if (strcmp(buf, "\\.\n") == 0 ||
> > strcmp(buf, "\\.\r\n") == 0)
> > {
> > copydone = true;
> > break;
> > }
>
> Indeed, it's exactly that problem.
> And there's the related problem that it derails the input stream
> in a way that lines of data become commands, but that one is
> not specific to that particular error.
>
> For the backslash-dot in a quoted string, the root cause is
> that psql is not aware that the contents are CSV so it can't
> parse them properly.
> I can think of several ways of working around that, more or less
> inelegant:
>
> - the end of data could be expressed as a length (in number of lines
> for instance) instead of an in-data marker.
>
> - the end of data could be configurable, as in the MIME structure of
> multipart mail messages, where a part is ended by a "boundary",
> line, generally a long randomly generated string. This boundary
> would have to be known to psql through setting a dedicated
> variable or command.
>
> - COPY as the SQL command could have the boundary option
> for data fed through its STDIN. This could neutralize the
> special role of backslash-dot in general, not just in quoted fields,
> since the necessity to quote backslash-dot is a wart anyway.
Well, these all kind of require a change to the COPY format, which
hasn't changed in many years.
> - psql could be told somehow that the next piece of inline data is in
> the CSV format, and then pass it through a CSV parser.
That might be the cleanest solution, but how would we actually input
multi-line data in CSV mode with \. alone on a line?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-01-28 21:50:42 | Re: Speeding up text_position_next with multibyte encodings |
Previous Message | Bruce Momjian | 2019-01-28 21:44:48 | Re: backslash-dot quoting in COPY CSV |