Re: Should CSV parsing be stricter about mid-field quotes?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Noah Misch" <noah(at)leadboat(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should CSV parsing be stricter about mid-field quotes?
Date: 2024-10-11 19:53:09
Message-ID: 43e1e852-e3ba-4f24-a72b-72224acdbea4@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 11, 2024, at 15:04, Joel Jacobson wrote:
> On Thu, Oct 10, 2024, at 10:37, Daniel Verite wrote:
>> Joel Jacobson wrote:
>>
>>> - No Headers or Metadata:
>>
>> It's not clear why it's necessary to disable the HEADER option
>> for this format?
>
> It's not necessary, no, just couldn't see a use-case,
> since I only thought about the COPY FROM case
> where one would be dealing with unstructured undelimited
> text files, such as log files coming from some other system,
> that I've never seen have header rows.
>
> However, thanks to your question, I see how a user
> might want to use the raw format to export a text
> column "as is" using COPY TO, in which case it would
> be useful to use HEADER and then HEADER MATCH
> for COPY FROM.
>
> I therefore think the HEADER option should be supported
> for the new raw format.
>
>>> The format does not support header rows or end-of-data markers;
>>> every line is treated as data.
>>
>> With COPY FROM STDIN followed by inline data in a script,
>> an end-of-data marker is required. That's also a problem
>> for CSV except it's mitigated by the possibility of quoting
>> (using "\." instead of \.)
>
> Right. As long as \. won't have any special meaning for the raw format
> except in the STDIN case, that seems fine.
>
> I haven't looked at that part of the code in detail yet though.
>
> As a preparatory step, I think we should replace the two
> "binary" and "csv_mode" bool fields in CopyFormatOptions,
> with a new "format" field of a new new CopyFormat enum type.
>
> If instead introducing another bool field, I think the code would
> be too cluttered.

I'm starting a new thread for this with a more suitable subject.

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-10-11 20:29:15 New "raw" COPY format
Previous Message Masahiko Sawada 2024-10-11 18:15:46 Re: Add contrib/pg_logicalsnapinspect