Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-11-26 07:10:50
Message-ID: CAD21AoBW5dEv=Gd2iF_BYNZGEsF=3KTG7fpq=vP5qwpC1CAOeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 24, 2024 at 6:06 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Nov 2024 13:01:06 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
> >> /*
> >> * CopyReadLineText - inner loop of CopyReadLine for text mode
> >> */
> >> -static pg_attribute_always_inline bool
> >> +static bool
> >> CopyReadLineText(CopyFromState cstate, bool is_csv)
> >>
> >> Is this an intentional change?
> >> CopyReadLineText() has "bool in_csv".
> >
> > Yes, I'm not sure it's really necessary to make it inline since the
> > benchmark results don't show much difference. Probably this is because
> > the function has 'is_csv' in some 'if' branches but the compiler
> > cannot optimize out the whole 'if' branches as most 'if' branches
> > check 'is_csv' and other variables.
>
> I see. If explicit "inline" isn't related to performance, we
> don't need explicit "inline".
>
> > I've attached the v25 patches that squashed the minor changes I made
> > in v24 and incorporated all comments I got so far. I think these two
> > patches are in good shape. Could you rebase remaining patches on top
> > of them so that we can see the big picture of this feature?
>
> OK. I'll work on it.
>
> > Regarding exposing the structs such as CopyToStateData, v22-0004 patch
> > moves most of all copy-related structs to copyapi.h from copyto.c,
> > copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
> > expose CopyToStateData (and related structs) in a new file
> > copyto_internal.h and keep other structs in the original header files.
>
> Custom COPY format extensions need to use
> CopyToStateData/CopyFromStateData. For example,
> CopyToStateData::rel is used to retrieve table schema. If we
> move CopyToStateData to copyto_internal.h not copyapi.h,
> custom COPY format extensions need to include
> copyto_internal.h. I feel that it's strange that extensions
> need to use internal headers.
>
> What is your real concern? If you don't want to export
> CopyToStateData/CopyFromStateData entirely, we can provide
> accessors only for some members of them.

I'm not against exposing CopyToStateData and CopyFromStateData. My
concern is that if we move all copy-related structs to copyapi.h,
other copy-related files would need to include copyapi.h even if the
file is not related to copy format APIs. IMO copyapi.h should have
only copy-format-API-related variables structs such as CopyFromRoutine
and CopyToRoutine and functions that custom COPY format extension can
utilize to access data source and destination, such as CopyGetData().

When it comes to CopyToStateData and CopyFromStateData, I feel that
they have mixed fields of common fields (e.g., rel, num_errors, and
transition_capture) and format-specific fields (e.g., input_buf,
line_buf, and eol_type). While it makes sense to me that custom copy
format extensions can access the common fields, it seems odd to me
that they can access text-and-csv-format-specific fields such as
input_buf. We might want to sort out these fields but it could be a
huge task.

Also, I realized that CopyFromTextLikeOneRow() does input function
calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY
options. I think these should be done in the core side.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-11-26 07:22:02 Re: Amcheck verification of GiST and GIN
Previous Message Kirill Reshke 2024-11-26 06:50:20 Re: Amcheck verification of GiST and GIN