Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-11-22 21:01:06
Message-ID: CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 20, 2024 at 6:55 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoA1s0nzjGU9t3N_uNdg3SZeOxXyH3rQfxYFEN3Y7JrKRQ(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 20 Nov 2024 14:14:27 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I've extracted the changes to refactor COPY TO/FROM to use the format
> > callback routines from v23 patch set, which seems to be a better patch
> > split to me. Also, I've reviewed these changes and made some changes
> > on top of them. The attached patches are:
> >
> > 0001: make COPY TO use CopyToRoutine.
> > 0002: minor changes to 0001 patch. will be fixed up.
> > 0003: make COPY FROM use CopyFromRoutine.
> > 0004: minor changes to 0003 patch. will be fixed up.
> >
> > I've confirmed that v24 has a similar performance improvement to v23.
> > Please check these extractions and minor change suggestions.
>
> Thanks. Here are my comments:

Thank you for the comments!

>
> 0002:
>
> +/* TEXT format */
>
> "text" may be better than "TEXT". We use "text" not "TEXT"
> in other places.
>
> +static const CopyToRoutine CopyToRoutineText = {
> + .CopyToStart = CopyToTextLikeStart,
> + .CopyToOutFunc = CopyToTextLikeOutFunc,
> + .CopyToOneRow = CopyToTextOneRow,
> + .CopyToEnd = CopyToTextLikeEnd,
> +};
>
> +/* BINARY format */
>
> "binary" may be better than "BINARY". We use "binary" not
> "BINARY" in other places.
>
> +static const CopyToRoutine CopyToRoutineBinary = {
> + .CopyToStart = CopyToBinaryStart,
> + .CopyToOutFunc = CopyToBinaryOutFunc,
> + .CopyToOneRow = CopyToBinaryOneRow,
> + .CopyToEnd = CopyToBinaryEnd,
> +};
>
> +/* Return COPY TO routines for the given option */
>
> option ->
> options
>
> +static const CopyToRoutine *
> +CopyToGetRoutine(CopyFormatOptions opts)

Fixed all the above comments for 0002 patch.

>
>
> 0003:
>
> diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
> index 99981b1579..224fda172e 100644
> --- a/src/include/commands/copyapi.h
> +++ b/src/include/commands/copyapi.h
> @@ -56,4 +56,46 @@ typedef struct CopyToRoutine
> void (*CopyToEnd) (CopyToState cstate);
> } CopyToRoutine;
>
> +/*
> + * API structure for a COPY FROM format implementation. Note this must be
>
> Should we remove a tab character here?

Fixed.

>
> 0004:
>
> diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
> index e77986f9a9..7f1de8a42b 100644
> --- a/src/backend/commands/copyfrom.c
> +++ b/src/backend/commands/copyfrom.c
> @@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo
> /* non-export function prototypes */
> static void ClosePipeFromProgram(CopyFromState cstate);
>
> -
> /*
> - * CopyFromRoutine implementations for text and CSV.
> + * built-in format-specific routines. One-row callbacks are defined in
>
> built-in ->
> Built-in
>
> /*
> - * CopyFromTextLikeInFunc
> - *
> - * Assign input function data for a relation's attribute in text/CSV format.
> + * COPY FROM routines for built-in formats.
> ++
>
> "+" ->
> " *"
>
> +/* TEXT format */
>
> TEXT -> text?
>
> +/* BINARY format */
>
> BINARY -> binary?
>
> +/* Return COPY FROM routines for the given option */
>
> option ->
> options
>
> +static const CopyFromRoutine *
> +CopyFromGetRoutine(CopyFormatOptions opts)

Fixed.

>
> diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
> index 0447c4df7e..5416583e94 100644
> --- a/src/backend/commands/copyfromparse.c
> +++ b/src/backend/commands/copyfromparse.c
>
> +static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
> + Datum *values, bool *nulls, bool is_csv);
>
> Oh, I didn't know that we don't need inline in a function
> declaration.

Removed this function declaration as it's not necessarily necessary.

>
> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
> /*
> * CopyReadLineText - inner loop of CopyReadLine for text mode
> */
> -static pg_attribute_always_inline bool
> +static bool
> CopyReadLineText(CopyFromState cstate, bool is_csv)
>
> Is this an intentional change?
> CopyReadLineText() has "bool in_csv".

Yes, I'm not sure it's really necessary to make it inline since the
benchmark results don't show much difference. Probably this is because
the function has 'is_csv' in some 'if' branches but the compiler
cannot optimize out the whole 'if' branches as most 'if' branches
check 'is_csv' and other variables.

I've attached the v25 patches that squashed the minor changes I made
in v24 and incorporated all comments I got so far. I think these two
patches are in good shape. Could you rebase remaining patches on top
of them so that we can see the big picture of this feature?

Regarding exposing the structs such as CopyToStateData, v22-0004 patch
moves most of all copy-related structs to copyapi.h from copyto.c,
copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
expose CopyToStateData (and related structs) in a new file
copyto_internal.h and keep other structs in the original header files.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v25-0002-Refactor-COPY-FROM-to-use-format-callback-functi.patch application/octet-stream 33.6 KB
v25-0001-Refactor-COPY-TO-to-use-format-callback-function.patch application/octet-stream 17.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message dinesh salve 2024-11-22 21:02:26 Re: explain plans for foreign servers
Previous Message Devulapalli, Raghuveer 2024-11-22 21:00:01 RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C