From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tatsuo Ishii" <ishii(at)postgresql(dot)org> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: New "raw" COPY format |
Date: | 2024-10-14 08:07:50 |
Message-ID: | fb67fa34-0e62-4a0a-9bc2-d9d33a5bce6a@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Oct 13, 2024, at 14:39, Joel Jacobson wrote:
> On Sun, Oct 13, 2024, at 11:52, Tatsuo Ishii wrote:
>> After copy imported the "unstructured text file" in "row" COPY format,
>> what the column type is? text? or bytea? If it's text, how do you
>> handle encoding conversion if the "unstructured text file" is encoded
>> in server side unsafe encoding such as SJIS?
>>
>>> All characters are taken literally.
>>> There is no special handling for quotes, backslashes, or escape sequences.
>>
>> If SJIS text is imported "literally" (i.e. no encoding conversion), it
>> should be rejected.
>
> I think encoding conversion is still necessary,
> and should work the same as for the COPY formats "text" and "csv".
Attached is a first draft implementation of the new proposed COPY "raw" format.
The first two patches are just the bug fix in HEAD, reported separately:
https://commitfest.postgresql.org/50/5297/
* v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patch
The first patch fixes a thinko in tests for COPY options force_not_null and force_null.
* v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch
The second patch fixes validation of FORCE_NOT_NULL/FORCE_NULL for all-columns case.
* v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patch
The third patch introduces a new enum CopyFormat, with options for the three current formats.
* v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch
The fourth patch reorganize ProcessCopyOptions for clarity and consistent option handling.
* v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch
Finally, the firth patch introduces the new "raw" COPY format.
Docs and tests updated.
The raw format currently goes through the same multiple stages,
as the text and CSV formats. I'm not sure what the best approach would be,
if we would want to create a special fast parsing path for this.
/Joel
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Fix-thinko-in-tests-for-COPY-options-force_not_null-.patch | application/octet-stream | 4.6 KB |
v4-0002-Fix-validation-of-FORCE_NOT_NULL-FORCE_NULL-for-all-.patch | application/octet-stream | 5.0 KB |
v4-0003-Replace-binary-flags-binary-and-csv_mode-with-format.patch | application/octet-stream | 18.6 KB |
v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch | application/octet-stream | 19.9 KB |
v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch | application/octet-stream | 30.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2024-10-14 08:45:25 | Re: Add contrib/pg_logicalsnapinspect |
Previous Message | Peter Eisentraut | 2024-10-14 07:47:59 | Improve node type forward reference |