Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-02-01 00:34:52
Message-ID: CAD21AoA3KMddnjxY1hxth3f4f1wo8a8i2icgK6GEKqXNR_e6jA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 31, 2025 at 3:10 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 31 Jan 2025 14:25:34 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I think that CopyToState and CopyFromState are not APIs but the
> > execution states. I'm not against exposing CopyToState and
> > CopyFromState. What I'd like to avoid is that we end up adding
> > everything (including new fields we add in the future) related to copy
> > operation to copyapi.h, leading to include copyapi.h into files that
> > are not related to custom format api. fdwapi.h and tsmapi.h as
> > examples have only a struct having a bunch of callbacks but not the
> > execution state data such as SampScanState are not defined there.
>
> Thanks for sharing examples. But it seems that
> fdwapi.h/tsmapi.h (ForeignScanState/SampleScanSate) are not
> good examples. It seems that PostgreSQL uses
> nodes/execnodes.h for all *ScanState. It seems that the
> sparation is not related to *api.h usage.

I didn't mean these examples perfectly apply the copyapi.h case.
Again, what I'd like to avoid is that we end up adding everything
(including new fields we add in the future) related to copy operation
to copyapi.h. For example, with v28 that moves both CopyFromState and
CopyToState to copyapi.h, file_fdw.c includes unrelated CopyToState
struct via copyfrom_internal.h -> copyapi.h. In addition to that, both
copyfrom.c and copyfrom_internal.h did the same, which made me think
copyfrom_internal.h mostly no longer plays its role. I'm very welcome
to other ideas too if they could achieve the same goal.

>
> > My understanding is that we don't strictly prohibit _internal.h from
> > being included in out of core files. For example, file_fdw.c includes
> > copyfrom_internal.h in order to access some fields of CopyFromState.
> >
> > If the name with _internal.h is the problem, we can rename them to
> > copyfrom.h and copyto.h. It makes sense to me that the code that needs
> > to access the internal of the copy execution state include _internal.h
> > header, though.
>
> Thanks for sharing the file_fdw.c example. I'm OK with
> _internal.h suffix because PostgreSQL doesn't prohibit
> _internal.h usage by extensions as you mentioned.
>
> >> > While we get the format routines for custom formats in
> >> > ProcessCopyOptionFormat(), we do that for built-in formats in
> >> > BeginCopyTo(), which seems odd to me. I think we can have
> >> > CopyToGetRoutine() responsible for getting CopyToRoutine for built-in
> >> > formats as well as custom format. The same is true for
> >> > CopyFromRoutine.
> >>
> >> I like the current design because we don't need to export
> >> CopyToGetBuiltinRoutine() (we can use static for
> >> CopyToGetBuiltinRoutine()) but I applied your
> >> suggestion. Because it's not a strong opinion.
> >
> > I meant that ProcessCopyOptionFormat() doesn't not necessarily get the
> > routine. An idea is that in ProcessCopyOptionFormat() we just get the
> > OID of the handler function, and then set up the format routine in
> > BeginCopyTo(). I've attached a patch for this idea (applied on top of
> > 0009).
>
> Oh, sorry. I misunderstood your suggestion. I understand
> what you suggested by the patch. Thanks.
>
> If we use the approach, we can't show error position when a
> custom COPY format handler function returns invalid routine
> because DefElem for the "format" option isn't available in
> BeginCopyTo(). Is it acceptable? If it's acceptable, let's
> use the approach.

I think we can live with it. All errors happening while processing the
copy options don't necessarily show the error position.

> Oh, sorry. I forgot to follow function name change in
> 0009. I attach the v30 patch set that fixes it in 0009.

Thank you for updating the patch! I'll review these patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-01 02:20:55 Re: jsonlog missing from logging_collector description
Previous Message Andres Freund 2025-01-31 23:21:54 Re: postgresql.conf.sample ordering for IO, worker related GUCs