Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Noah Misch" <noah(at)leadboat(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2024-10-11 13:04:39
Message-ID: 0f540f0c-f2ec-40ff-b3c9-bc2229dc3bb1@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 10, 2024, at 10:37, Daniel Verite wrote:
> Joel Jacobson wrote:
>
>> - No Headers or Metadata:
>
> It's not clear why it's necessary to disable the HEADER option
> for this format?

It's not necessary, no, just couldn't see a use-case,
since I only thought about the COPY FROM case
where one would be dealing with unstructured undelimited
text files, such as log files coming from some other system,
that I've never seen have header rows.

However, thanks to your question, I see how a user
might want to use the raw format to export a text
column "as is" using COPY TO, in which case it would
be useful to use HEADER and then HEADER MATCH
for COPY FROM.

I therefore think the HEADER option should be supported
for the new raw format.

>> The format does not support header rows or end-of-data markers;
>> every line is treated as data.
>
> With COPY FROM STDIN followed by inline data in a script,
> an end-of-data marker is required. That's also a problem
> for CSV except it's mitigated by the possibility of quoting
> (using "\." instead of \.)

Right. As long as \. won't have any special meaning for the raw format
except in the STDIN case, that seems fine.

I haven't looked at that part of the code in detail yet though.

As a preparatory step, I think we should replace the two
"binary" and "csv_mode" bool fields in CopyFormatOptions,
with a new "format" field of a new new CopyFormat enum type.

If instead introducing another bool field, I think the code would
be too cluttered.

Best regards,

Joel

Attachment Content-Type Size
v1-0001-Replace-binary-flags-binary-and-csv_mode-with-format.patch application/octet-stream 18.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-10-11 13:15:33 Re: Add contrib/pg_logicalsnapinspect
Previous Message Chapman Flack 2024-10-11 12:42:19 Re: Doc of typmod arg perhaps deserves an update