Re: New "raw" COPY format

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tatsuo Ishii" <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: New "raw" COPY format
Date: 2024-10-14 08:07:50
Message-ID: fb67fa34-0e62-4a0a-9bc2-d9d33a5bce6a@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 13, 2024, at 14:39, Joel Jacobson wrote:
> On Sun, Oct 13, 2024, at 11:52, Tatsuo Ishii wrote:
>> After copy imported the "unstructured text file" in "row" COPY format,
>> what the column type is? text? or bytea? If it's text, how do you
>> handle encoding conversion if the "unstructured text file" is encoded
>> in server side unsafe encoding such as SJIS?
>>
>>> All characters are taken literally.
>>> There is no special handling for quotes, backslashes, or escape sequences.
>>
>> If SJIS text is imported "literally" (i.e. no encoding conversion), it
>> should be rejected.
>
> I think encoding conversion is still necessary,
> and should work the same as for the COPY formats "text" and "csv".

Attached is a first draft implementation of the new proposed COPY "raw" format.

The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/

* v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patch
The first patch fixes a thinko in tests for COPY options force_not_null and force_null.

* v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch
The second patch fixes validation of FORCE_NOT_NULL/FORCE_NULL for all-columns case.

* v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patch
The third patch introduces a new enum CopyFormat, with options for the three current formats.

* v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch
The fourth patch reorganize ProcessCopyOptions for clarity and consistent option handling.

* v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch
Finally, the firth patch introduces the new "raw" COPY format.

Docs and tests updated.

The raw format currently goes through the same multiple stages,
as the text and CSV formats. I'm not sure what the best approach would be,
if we would want to create a special fast parsing path for this.

/Joel

Attachment Content-Type Size
v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patch application/octet-stream 4.6 KB
v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch application/octet-stream 5.0 KB
v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patch application/octet-stream 18.6 KB
v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch application/octet-stream 19.9 KB
v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch application/octet-stream 30.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-10-14 08:45:25 Re: Add contrib/pg_logicalsnapinspect
Previous Message Peter Eisentraut 2024-10-14 07:47:59 Improve node type forward reference