Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-01-30 15:42:13
Message-ID: 20250131.004213.996398325029997587.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoDyBJrCsh5vNFWcRmS0_XKCCCP4gLzZnLCayYccLpaBfw(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 28 Jan 2025 15:00:03 -0800,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> While 0001 and 0002 look good to me overall, we still need to polish
> subsequent patches. Here are review comments:

I attached the v29 patch set that applied your suggestions:

Refactoring:
0001-0002: There are some trivial changes (copyright year
change and some comment fixes)

COPY TO related:
0003: Applied your copyto_internal.h related,
CopyToGetRoutine() related and built-in CopyToRoutine
suggestions
0004: Applied your copyto_internal.h related suggestion
0005: No change

COPY FROM related:
0006: Applied your copyfrom_internal.h related,
CopyFromGetRoutine() related and built-in CopyFromRoutine
suggestions
0007: Applied your copyfrom_internal.h related suggestion
0008: Applied your CopyFromStateRead() related suggestion
0009: No change

> I still find that it would not be a good idea to move all copy-related
> struct definitions to copyapi.h because we need to include copyapi.h
> file into a .c file even if the file is not related to the custom copy
> format routines. I think that copyapi.h should have only the
> definitions of CopyToRoutine and CopyFromRoutine as well as some
> functions related to the custom copy format. Here is an idea:
>
> - CopyToState and CopyFromState are defined in copyto_internal.h (new
> file) and copyfrom_internal.h, respectively.
> - These two files #include's copy.h and other necessary header files.
> - copyapi.h has only CopyToRoutine and CopyFromRoutine and #include's
> both copyfrom_internal.h and copyto_internal.h.
> - copyto.c, copyfrom.c and copyfromparse.c #include copyapi.h
>
> Some advantages of this idea:
>
> - we can keep both CopyToState and CopyFromState private in _internal.h files.
> - custom format extension can include copyapi.h to provide a custom
> copy format routine and to access the copy state data.
> - copy-related .c files won't need to include copyapi.h if they don't
> use custom copy format routines.

Hmm. I thought Copy{To,From}State are "public" API not
"private" API for extensions. Because extensions need to use
at least Copy{To,From}State::opaque directly. If we want to
make Copy{To,From}State private, I think that we should
provide getter/setter for needed members of
Copy{To,From}State such as
Copy{To,From}State{Get,Set}Opaque().

It's a design in the v2 patch set:
https://www.postgresql.org/message-id/20231221.183504.1240642084042888377.kou%40clear-code.com

We discussed that we can make CopyToState public:
https://www.postgresql.org/message-id/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com

What does "private" mean here? I thought that it means that
"PostgreSQL itself can use it". But it seems that you mean
that "PostgreSQL itself and custom format extensions can use
it but other extensions can't use it".

I'm not familiar with "_internal.h" in PostgreSQL but is
"_internal.h" for the latter "private" mean?

> The 0008 patch introduces CopyFromStateRead(). While it would be a
> good start, I think we can consider sorting out low-level
> communication functions more. For example, CopyReadBinaryData() uses
> the internal 64kB buffer but some custom copy format extensions might
> want to use a larger buffer in its own implementation, which would
> require exposing CopyGetData() etc. Given that we might expose more
> functions to provide more ways for extensions, we might want to rename
> CopyFromStateRead().

This suggests that we just need a low-level CopyGetData()
not a high-level CopyReadBinaryData() as the first step,
right?

I agree that we should start from a minimal API set.

I've renamed CopyFromStateRead() to CopyFromStateGetData()
because it wraps CopyGetData() now.

> While we get the format routines for custom formats in
> ProcessCopyOptionFormat(), we do that for built-in formats in
> BeginCopyTo(), which seems odd to me. I think we can have
> CopyToGetRoutine() responsible for getting CopyToRoutine for built-in
> formats as well as custom format. The same is true for
> CopyFromRoutine.

I like the current design because we don't need to export
CopyToGetBuiltinRoutine() (we can use static for
CopyToGetBuiltinRoutine()) but I applied your
suggestion. Because it's not a strong opinion.

> Copy[To|From]Routine for built-in formats are missing to set the node type.

Oh, sorry. I missed this.

Thanks,
--
kou

Attachment Content-Type Size
v29-0001-Refactor-COPY-TO-to-use-format-callback-function.patch text/x-patch 17.9 KB
v29-0002-Refactor-COPY-FROM-to-use-format-callback-functi.patch text/x-patch 32.5 KB
v29-0003-Add-support-for-adding-custom-COPY-TO-format.patch text/x-patch 21.0 KB
v29-0004-Export-CopyToStateData-as-private-data.patch text/x-patch 9.5 KB
v29-0005-Add-support-for-implementing-custom-COPY-TO-form.patch text/x-patch 2.5 KB
v29-0006-Add-support-for-adding-custom-COPY-FROM-format.patch text/x-patch 13.2 KB
v29-0007-Use-COPY_SOURCE_-prefix-for-CopySource-enum-valu.patch text/x-patch 3.5 KB
v29-0008-Add-support-for-implementing-custom-COPY-FROM-fo.patch text/x-patch 2.4 KB
v29-0009-Add-CopyFromSkipErrorRow-for-custom-COPY-format-.patch text/x-patch 10.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-30 18:27:32 Re: JIT: The nullness of casetest.value can be determined at the JIT compile time.
Previous Message Alvaro Herrera 2025-01-30 15:29:35 Re: why there is not VACUUM FULL CONCURRENTLY?