From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Daniel Verite" <daniel(at)manitou-mail(dot)org>, "Aleksander Alekseev" <aleksander(at)timescale(dot)com> |
Cc: | "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: New "single" COPY format |
Date: | 2024-11-08 21:19:19 |
Message-ID: | 7abe064b-f660-465d-a522-341a325fe530@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 8, 2024, at 20:44, Daniel Verite wrote:
> Aleksander Alekseev wrote:
>
>> IMO it should be 'text' we already have with special options e.g.
>> DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters
>> and column delimiters (and no NULLs designations, and what else I
>> forgot) then your text file just contains one tuple per line.
>
> +1 for the idea that accepting "no delimiter" and "no escape"
> as a valid combination for the text format seems better
> than adding a new format.
> However inviting "NULL" into that syntax when it has nothing to do
> with the SQL "NULL" does not look like a good idea.
> Maybe DELIMITER '' ESCAPE '', or DELIMITER NONE ESCAPE NONE.
Okay, let's see if we can solve all problems I see with
overloading the 'text' format:
1. Text files containing \. in the middle of the file
% cat /tmp/test.txt
foo
\.
bar
How do we import such a file?
Is it not supported?
Or another option to turn off the special meaning of \.?
Both seems like bad ideas to me, maybe there is a nice idea I fail to see?
2. NULL option is \N for 'text', so to import a plain text
file safely, where \N lines should not be converted to NULL,
users would need to also specify NULL '', which seems
like a footgun to me.
3. What should happen if specifying DELIMITER NONE, and:
- specifying a column list with more than one column?
- not also specifying ESCAPE NONE?
4. What should happen if specifying ESCAPE NONE, and
- specifying a column list with more than one column?
5. What about the isomorphism violation, I brought up in my
previous email, that is, the non-bijective mapping and irreversibility,
for records with embedded newlines?
This is also a problem with a separate format,
but I wonder what you think about the problem,
if it's acceptable, or needs to be solved, and if so,
if you see any solutions.
> Besides, "single" as a format name does not sound right.
> Generally the name for a text format designates a set
> of characteristics meaning that certain combinations of
> characters have specific behaviors.
> Sometimes "plain" is used in the context of text formats
> to indicate that no character is special ("plain" is also the
> default subtype of "text" in MIME types).
>
> "single" as proposed is to be understood as "single-column",
> which is a consequence of the lack of a field delimiter, but
> not an intrinsic characteristic of the format.
> If COPY accepted fixed-length fields, it could be in a
> no-delimiter no-escape mode and still handle multiple
> columns, in opposition to what "single" suggests.
Good points. I agree "plain" is a better name.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-11-08 21:20:00 | Re: Fix port/pg_iovec.h building extensions on x86_64-darwin |
Previous Message | Nathan Bossart | 2024-11-08 20:00:36 | Re: Fix port/pg_iovec.h building extensions on x86_64-darwin |