Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-01-31 22:25:34
Message-ID: CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2025 at 7:42 AM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoDyBJrCsh5vNFWcRmS0_XKCCCP4gLzZnLCayYccLpaBfw(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 28 Jan 2025 15:00:03 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > While 0001 and 0002 look good to me overall, we still need to polish
> > subsequent patches. Here are review comments:
>
> I attached the v29 patch set that applied your suggestions:
>
> Refactoring:
> 0001-0002: There are some trivial changes (copyright year
> change and some comment fixes)
>
> COPY TO related:
> 0003: Applied your copyto_internal.h related,
> CopyToGetRoutine() related and built-in CopyToRoutine
> suggestions
> 0004: Applied your copyto_internal.h related suggestion
> 0005: No change
>
> COPY FROM related:
> 0006: Applied your copyfrom_internal.h related,
> CopyFromGetRoutine() related and built-in CopyFromRoutine
> suggestions
> 0007: Applied your copyfrom_internal.h related suggestion
> 0008: Applied your CopyFromStateRead() related suggestion
> 0009: No change
>
>
> > I still find that it would not be a good idea to move all copy-related
> > struct definitions to copyapi.h because we need to include copyapi.h
> > file into a .c file even if the file is not related to the custom copy
> > format routines. I think that copyapi.h should have only the
> > definitions of CopyToRoutine and CopyFromRoutine as well as some
> > functions related to the custom copy format. Here is an idea:
> >
> > - CopyToState and CopyFromState are defined in copyto_internal.h (new
> > file) and copyfrom_internal.h, respectively.
> > - These two files #include's copy.h and other necessary header files.
> > - copyapi.h has only CopyToRoutine and CopyFromRoutine and #include's
> > both copyfrom_internal.h and copyto_internal.h.
> > - copyto.c, copyfrom.c and copyfromparse.c #include copyapi.h
> >
> > Some advantages of this idea:
> >
> > - we can keep both CopyToState and CopyFromState private in _internal.h files.
> > - custom format extension can include copyapi.h to provide a custom
> > copy format routine and to access the copy state data.
> > - copy-related .c files won't need to include copyapi.h if they don't
> > use custom copy format routines.
>
> Hmm. I thought Copy{To,From}State are "public" API not
> "private" API for extensions. Because extensions need to use
> at least Copy{To,From}State::opaque directly. If we want to
> make Copy{To,From}State private, I think that we should
> provide getter/setter for needed members of
> Copy{To,From}State such as
> Copy{To,From}State{Get,Set}Opaque().
>
> It's a design in the v2 patch set:
> https://www.postgresql.org/message-id/20231221.183504.1240642084042888377.kou%40clear-code.com
>
> We discussed that we can make CopyToState public:
> https://www.postgresql.org/message-id/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com

I think that CopyToState and CopyFromState are not APIs but the
execution states. I'm not against exposing CopyToState and
CopyFromState. What I'd like to avoid is that we end up adding
everything (including new fields we add in the future) related to copy
operation to copyapi.h, leading to include copyapi.h into files that
are not related to custom format api. fdwapi.h and tsmapi.h as
examples have only a struct having a bunch of callbacks but not the
execution state data such as SampScanState are not defined there.

>
> What does "private" mean here? I thought that it means that
> "PostgreSQL itself can use it". But it seems that you mean
> that "PostgreSQL itself and custom format extensions can use
> it but other extensions can't use it".
>
> I'm not familiar with "_internal.h" in PostgreSQL but is
> "_internal.h" for the latter "private" mean?

My understanding is that we don't strictly prohibit _internal.h from
being included in out of core files. For example, file_fdw.c includes
copyfrom_internal.h in order to access some fields of CopyFromState.

If the name with _internal.h is the problem, we can rename them to
copyfrom.h and copyto.h. It makes sense to me that the code that needs
to access the internal of the copy execution state include _internal.h
header, though.

> > While we get the format routines for custom formats in
> > ProcessCopyOptionFormat(), we do that for built-in formats in
> > BeginCopyTo(), which seems odd to me. I think we can have
> > CopyToGetRoutine() responsible for getting CopyToRoutine for built-in
> > formats as well as custom format. The same is true for
> > CopyFromRoutine.
>
> I like the current design because we don't need to export
> CopyToGetBuiltinRoutine() (we can use static for
> CopyToGetBuiltinRoutine()) but I applied your
> suggestion. Because it's not a strong opinion.

I meant that ProcessCopyOptionFormat() doesn't not necessarily get the
routine. An idea is that in ProcessCopyOptionFormat() we just get the
OID of the handler function, and then set up the format routine in
BeginCopyTo(). I've attached a patch for this idea (applied on top of
0009).

Also, please check some regression test failures on cfbot.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
fix_format_option_process.patch application/octet-stream 10.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2025-01-31 23:10:23 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Sami Imseih 2025-01-31 22:24:32 Re: pgbench with partitioned tables