Re: Make COPY format extendable: Extract COPY TO format implementations

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, michael(at)paquier(dot)xyz, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-22 02:58:05
Message-ID: CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
> * Add CopyToState{Get,Set}Opaque()
> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
> * Add CopyToStateGetAttNumList()
> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>
> 3. Export CopySend*()
> * Rename CopySend*() to CopyToStateSend*() and export them
> * Exception: CopySendEndOfRow() to CopyToStateFlush() because
> it just flushes the internal buffer now.
> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>
I guess the purpose of these helpers is to avoid expose CopyToState to
copy.h, but I
think expose CopyToState to user might make life easier, users might want to use
the memory contexts of the structure (though I agree not all the
fields are necessary
for extension handers).

> The attached patch is based on the Sawada-san's patch and
> includes the above changes. Note that this patch is also
> dirty so just for discussion.
>
> My suggestions from this experience:
>
> 1. Split COPY handler to COPY TO handler and COPY FROM handler
>
> * CopyFormatRoutine is a bit tricky. An extension needs
> to create a CopyFormatRoutine node and
> a CopyToFormatRoutine node.
>
> * If we just require "copy_to_${FORMAT}(internal)"
> function and "copy_from_${FORMAT}(internal)" function,
> we can remove the tricky approach. And it also avoid
> name collisions with other handler such as tablesample
> handler.
> See also:
> https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> 2. Need an opaque space like IndexScanDesc::opaque does
>
> * A custom COPY TO handler needs to keep its data

I once thought users might want to parse their own options, maybe this
is a use case for this opaque space.

For the name, I thought private_data might be a better candidate than
opaque, but I do not insist.
>
> 3. Export CopySend*()
>
> * If we like minimum API, we just need to export
> CopySendData() and CopySendEndOfRow(). But
> CopySend{String,Char,Int32,Int16}() will be convenient
> custom COPY TO handlers. (A custom COPY TO handler for
> Apache Arrow doesn't need them.)

Do you use the arrow library to control the memory? Is there a way that
we can let the arrow use postgres' memory context? I'm not sure this
is necessary, just raise the question for discussion.
>
> Questions:
>
> 1. What value should be used for "format" in
> PgMsg_CopyOutResponse message?
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
>
> It's 1 for binary format and 0 for text/csv format.
>
> Should we make it customizable by custom COPY TO handler?
> If so, what value should be used for this?
>
> 2. Do we need more tries for design discussion for the first
> implementation? If we need, what should we try?
>
>
> Thanks,
> --
> kou

+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+ CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+

extra semicolon.

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2023-12-22 05:00:00 Re: trying again to get incremental backup
Previous Message Amit Kapila 2023-12-22 02:50:25 Re: Track in pg_replication_slots the reason why slots conflict?