Re: csv format for psql

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Michael Paquier" <michael(at)paquier(dot)xyz>
Cc: "Fabien COELHO" <coelho(at)cri(dot)ensmp(dot)fr>,"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: csv format for psql
Date: 2018-11-09 16:28:07
Message-ID: ea4145e4-8c7c-4541-af24-500f1e47de88@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier wrote:

> Still what's the point except complicating the code? We don't care
> about anything fancy in the backend-side ProcessCopyOptions() when
> checking cstate->delim, and having some consistency looks like a good
> thing to me.

The backend has its reasons that don't apply to the psql output
format, mostly import performance according to [1]
It's not that nobody wants delimiter outside of US-ASCII,
as people do ask for that sometimes:

https://www.postgresql.org/message-id/f02ulk%242r3u%241%40news.hub.org
https://github.com/greenplum-db/gpdb/issues/1246

> However there is no option to specify
> an escape character, no option to specify a quote character, and it is
> not possible to force quotes for all values. Those are huge advantages
> as any output can be made compatible with other CSV variants. Isn't
> what is presented too limited?

The guidelines that the patch has been following are those of RFC 4180 [2]
with two exceptions on the field separator that we can define
and the end of lines that are OS-dependant instead of the fixed CRLF
that IETF seems to see as the norm.

The only reference to escaping in the RFC is:
"If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote"

The problem with using non-standard QUOTE or ESCAPE is that it's a
violation of the format that goes further than choosing a separator
different than comma, which is already a pain point.
We can always add these options later if there is demand. I suspect it
will never happen.

I looked at the 2004 archives when CSV was added to COPY, that's
around commit 862b20b38 in case anyone cares to look, but
I couldn't find a discussion on these options, all I could find is they were
present from the start.

But again COPY is concerned with importing the data that preexists,
even if it's weird, whereas a psql output formats are not.

[1] https://www.postgresql.org/message-id/4C9D2BC5.1080006%40optonline.net
[2] https://tools.ietf.org/html/rfc4180

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2018-11-09 16:35:00 Re: Alternative to \copy in psql modelled after \g
Previous Message Alvaro Herrera 2018-11-09 16:16:24 Re: notice processors for isolationtester