From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
Cc: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Noah Misch" <noah(at)leadboat(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Should CSV parsing be stricter about mid-field quotes? |
Date: | 2024-10-11 19:53:09 |
Message-ID: | 43e1e852-e3ba-4f24-a72b-72224acdbea4@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 11, 2024, at 15:04, Joel Jacobson wrote:
> On Thu, Oct 10, 2024, at 10:37, Daniel Verite wrote:
>> Joel Jacobson wrote:
>>
>>> - No Headers or Metadata:
>>
>> It's not clear why it's necessary to disable the HEADER option
>> for this format?
>
> It's not necessary, no, just couldn't see a use-case,
> since I only thought about the COPY FROM case
> where one would be dealing with unstructured undelimited
> text files, such as log files coming from some other system,
> that I've never seen have header rows.
>
> However, thanks to your question, I see how a user
> might want to use the raw format to export a text
> column "as is" using COPY TO, in which case it would
> be useful to use HEADER and then HEADER MATCH
> for COPY FROM.
>
> I therefore think the HEADER option should be supported
> for the new raw format.
>
>>> The format does not support header rows or end-of-data markers;
>>> every line is treated as data.
>>
>> With COPY FROM STDIN followed by inline data in a script,
>> an end-of-data marker is required. That's also a problem
>> for CSV except it's mitigated by the possibility of quoting
>> (using "\." instead of \.)
>
> Right. As long as \. won't have any special meaning for the raw format
> except in the STDIN case, that seems fine.
>
> I haven't looked at that part of the code in detail yet though.
>
> As a preparatory step, I think we should replace the two
> "binary" and "csv_mode" bool fields in CopyFormatOptions,
> with a new "format" field of a new new CopyFormat enum type.
>
> If instead introducing another bool field, I think the code would
> be too cluttered.
I'm starting a new thread for this with a more suitable subject.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-10-11 20:29:15 | New "raw" COPY format |
Previous Message | Masahiko Sawada | 2024-10-11 18:15:46 | Re: Add contrib/pg_logicalsnapinspect |