Re: New "raw" COPY format

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Jacob Champion" <jacob(dot)champion(at)enterprisedb(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: New "raw" COPY format
Date: 2024-10-16 18:30:47
Message-ID: 5d7c38c6-e3de-4861-b045-0cf4abee1a14@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
> Joel Jacobson wrote:
>
>> However, I thinking rejecting such column data seems like the
>> better alternative, to ensure data exported with COPY TO
>> can always be imported back using COPY FROM,
>> for the same format.
>
> On the other hand, that might prevent cases where we
> want to export, for instance, a valid json array:
>
> copy (select json_agg(col) from table ) to 'file' RAW
>
> This is a variant of the discussion in [1] where the OP does:
>
> copy (select json_agg(row_to_json(t)) from <query> t) TO 'file'
>
> and he complains that both text and csv "break the JSON".
> That discussion morphed into a proposed patch adding JSON
> format to COPY, but RAW would work directly as the OP
> expected.
>
> That is, unless <query> happens to include JSON fields with LF/CRLF
> in them, and the RAW format says this is an error condition.
> In that case it's quite annoying to make it an error, rather than
> simply let it pass.
>
>
> [1]
> https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com

Thanks for finding this related thread.

Very good example, that would be solved with RAW.

I've used a different hack myself many times to export text "as is",
which is to use COPY TO ... BINARY, and then to manually edit the
file using vim, stripping away the header and footer from the file. :D

But I don't see how JSON fields that come from PostgreSQL
could contain LF/CRLF though, could they?
Since LF/CR must be escaped inside a JSON field,
and when casting JSONB to text, there are no newlines
injected anywhere in between fields.
I can only see how a value of the legacy JSON type could
have newlines in it.

That doesn't matter though, this is still a very good example
on the need to export text "as is" from PostgreSQL to a file.

I like Jacob's idea of letting the user specify a DELIMITER
also for the RAW format, to override the automagical
newline detection. This would e.g. allow importing values
containing e.g. \r when the newline delimiter is \n,
which would otherwise be reject with an
"inconsistent newline style" error.

I think it would also be useful to allow specifying
DELIMITER NONE, which would allow importing an
entire text file, "as is", into a single column and single row.

To export a single column single row, the delimiter
wouldn't matter, since there wouldn't be any.
But DELIMITER NONE would be useful also for COPY TO,
if wanting to concatenate text "as is" without adding
newlines in between, to a file, similar to the
UNIX "cat" command.

Regarding the name for the format, I thought SINGLE
was nice before I read this message, but now since
the need for more rawness seems desirable,
I think I like RAW the most again, since if the
automagical newline detection can be overridden,
then it's really raw for real.

A final thought is to maybe consider just skipping
the automagical newline detection for RAW?

Instead of the automagical detection,
the default newline delimiter could be the OS default,
similar to how COPY TO works.

That way, it would almost always just work for most users,
as long as processing files within their OS,
and when not, they would just need to specify the DELIMITER.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-10-16 18:52:04 Re: Limiting overshoot in nbtree's parallel SAOP index scans
Previous Message Nikita Malakhov 2024-10-16 18:26:57 Re: Considering fractional paths in Append node