From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2023-12-21 09:35:04 |
Message-ID: | 20231221.183504.1240642084042888377.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I've sketched the above idea including a test module in
> src/test/module/test_copy_format, based on v2 patch. It's not splitted
> and is dirty so just for discussion.
I implemented a sample COPY TO handler for Apache Arrow that
supports only integer and text.
I needed to extend the patch:
1. Add an opaque space for custom COPY TO handler
* Add CopyToState{Get,Set}Opaque()
https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
2. Export CopyToState::attnumlist
* Add CopyToStateGetAttNumList()
https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
3. Export CopySend*()
* Rename CopySend*() to CopyToStateSend*() and export them
* Exception: CopySendEndOfRow() to CopyToStateFlush() because
it just flushes the internal buffer now.
https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
The attached patch is based on the Sawada-san's patch and
includes the above changes. Note that this patch is also
dirty so just for discussion.
My suggestions from this experience:
1. Split COPY handler to COPY TO handler and COPY FROM handler
* CopyFormatRoutine is a bit tricky. An extension needs
to create a CopyFormatRoutine node and
a CopyToFormatRoutine node.
* If we just require "copy_to_${FORMAT}(internal)"
function and "copy_from_${FORMAT}(internal)" function,
we can remove the tricky approach. And it also avoid
name collisions with other handler such as tablesample
handler.
See also:
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
2. Need an opaque space like IndexScanDesc::opaque does
* A custom COPY TO handler needs to keep its data
3. Export CopySend*()
* If we like minimum API, we just need to export
CopySendData() and CopySendEndOfRow(). But
CopySend{String,Char,Int32,Int16}() will be convenient
custom COPY TO handlers. (A custom COPY TO handler for
Apache Arrow doesn't need them.)
Questions:
1. What value should be used for "format" in
PgMsg_CopyOutResponse message?
It's 1 for binary format and 0 for text/csv format.
Should we make it customizable by custom COPY TO handler?
If so, what value should be used for this?
2. Do we need more tries for design discussion for the first
implementation? If we need, what should we try?
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
v2-custom_copy_format_draft.diff | text/x-patch | 36.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2023-12-21 09:35:47 | Re: Autonomous transactions 2023, WIP |
Previous Message | shveta malik | 2023-12-21 09:31:59 | Re: Synchronizing slots from primary to standby |