Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-02-25 22:05:28
Message-ID: CAD21AoBjzkL2Lv7j4teaHBZvNmKctQtH6X71kN_sj6Fm-+VvJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2025 at 6:48 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 20 Feb 2025 15:28:26 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Looking at the 0001 patch again, I have a question: we have
> > CopyToTextLikeOneRow() for both CSV and text format:
> >
> > +/* Implementation of the per-row callback for text format */
> > +static void
> > +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > + CopyToTextLikeOneRow(cstate, slot, false);
> > +}
> > +
> > +/* Implementation of the per-row callback for CSV format */
> > +static void
> > +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > + CopyToTextLikeOneRow(cstate, slot, true);
> > +}
> >
> > These two functions pass different is_csv value to that function,
> > which is used as follows:
> >
> > + if (is_csv)
> > + CopyAttributeOutCSV(cstate, string,
> > +
> > cstate->opts.force_quote_flags[attnum - 1]);
> > + else
> > + CopyAttributeOutText(cstate, string);
> >
> > However, we can know whether the format is CSV or text by checking
> > cstate->opts.csv_mode instead of passing is_csv. That way, we can
> > directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> > CopyToTextOneRow(). It would not help performance since we already
> > inline CopyToTextLikeOneRow(), but it looks simpler.
>
> This means the following, right?
>
> 1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
> 2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
> and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
> instead of is_csv
> 3. We use CopyToTextLikeOneRow() for
> CopyToRoutineText::CopyToOneRow and
> CopyToRoutineCSV::CopyToOneRow
>
> If we use this approach, we can't remove the following
> branch in compile time:
>
> + if (is_csv)
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1]);
> + else
> + CopyAttributeOutText(cstate, string);
>
> We can remove the branch in compile time with the current
> approach (constant argument + inline).
>
> It may have a negative performance impact because the "if"
> is used many times with large data. (That's why we choose
> the constant argument + inline approach in this thread.)
>

Thank you for the explanation, I missed that fact. I'm fine with having is_csv.

The first two patches are refactoring patches (+ small performance
improvements). I've reviewed these patches again and attached the
updated patches. I reorganized the function order and updated comments
etc. I find that these patches are reasonably ready to push. Could you
review these versions? I'm going to push them, barring objections and
further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v33-0002-Refactor-COPY-FROM-to-use-format-callback-functi.patch application/octet-stream 32.5 KB
v33-0001-Refactor-COPY-TO-to-use-format-callback-function.patch application/octet-stream 18.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-02-25 22:13:09 Re: Trigger more frequent autovacuums of heavy insert tables
Previous Message Tom Lane 2025-02-25 21:52:08 Re: pgbench client-side performance issue on large scripts