From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
Cc: | "Daniel Verite" <daniel(at)manitou-mail(dot)org>, "Aleksander Alekseev" <aleksander(at)timescale(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: New "single" COPY format |
Date: | 2024-11-09 07:07:33 |
Message-ID: | 2ed704e9-cee1-450c-aa84-216af23a3a86@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 8, 2024, at 22:47, David G. Johnston wrote:
> On Fri, Nov 8, 2024 at 2:20 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>>
>> 1. Text files containing \. in the middle of the file
>> % cat /tmp/test.txt
>> foo
>> \.
>> bar
>>
>> Or another option to turn off the special meaning of \.?
>
> This does seem like an orthogonal option worth considering.
I agree; if we want to integrate this into 'text', it's an option worth considering.
> I'm on board with a new named format that selects the desired defaults
> instead of requiring the user to know and change them all manually.
>
> This seems to me like a "list" format. Implying each row is a list
> entry. Since we have tables the concept of list would likewise
> reasonably imply a single column.
>
> Since newlines are special, i.e., record delimiters, "plain" thus would
> remain misleading. It could be used for a case where the entire file
> is loaded into a new row, single column.
I think 'list' is the best proposal I've heard so far.
New patch attached, only change since v18 is the renaming.
There is one remaining important issue though:
Fields that contain newline characters, cause an irreversibility problem.
It feels wrong to leave this as a potential pitfall for users.
Here's a draft of an idea I'm considering (not yet implemented):
- Fast path for newline-free types:
For the list of built-in types where we know the ::text representation cannot
contain newlines, we take the fast path in NextCopyFromRawFields(),
pointing cstate->raw_fields[0] directly to cstate->line_buf.data.
- Handling newlines for other types:
For any other types, we would need to scan the string for newline characters.
If a newline is encountered, it would, by default, result in an error when
using the list format, unless:
- Optional quoting mechanism:
If the QUOTE option is specified, we can allow newlines within fields by
quoting the entire line. Any quote characters within the field would be
handled by doubling them, similar to CSV escaping rules.
/Joel
Attachment | Content-Type | Size |
---|---|---|
v19-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch | application/octet-stream | 18.8 KB |
v19-0002-Add-COPY-format-list.patch | application/octet-stream | 32.2 KB |
v19-0003-Reorganize-option-validations.patch | application/octet-stream | 19.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2024-11-09 07:11:18 | Re: Fix small typo, use InvalidRelFileNumber instead of InvalidOid |
Previous Message | Tatsuo Ishii | 2024-11-09 06:25:59 | Re: Fix for Extra Parenthesis in pgbench progress message |