From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
Cc: | michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-02-25 22:05:28 |
Message-ID: | CAD21AoBjzkL2Lv7j4teaHBZvNmKctQtH6X71kN_sj6Fm-+VvJQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 20, 2025 at 6:48 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 20 Feb 2025 15:28:26 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Looking at the 0001 patch again, I have a question: we have
> > CopyToTextLikeOneRow() for both CSV and text format:
> >
> > +/* Implementation of the per-row callback for text format */
> > +static void
> > +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > + CopyToTextLikeOneRow(cstate, slot, false);
> > +}
> > +
> > +/* Implementation of the per-row callback for CSV format */
> > +static void
> > +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > + CopyToTextLikeOneRow(cstate, slot, true);
> > +}
> >
> > These two functions pass different is_csv value to that function,
> > which is used as follows:
> >
> > + if (is_csv)
> > + CopyAttributeOutCSV(cstate, string,
> > +
> > cstate->opts.force_quote_flags[attnum - 1]);
> > + else
> > + CopyAttributeOutText(cstate, string);
> >
> > However, we can know whether the format is CSV or text by checking
> > cstate->opts.csv_mode instead of passing is_csv. That way, we can
> > directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> > CopyToTextOneRow(). It would not help performance since we already
> > inline CopyToTextLikeOneRow(), but it looks simpler.
>
> This means the following, right?
>
> 1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
> 2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
> and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
> instead of is_csv
> 3. We use CopyToTextLikeOneRow() for
> CopyToRoutineText::CopyToOneRow and
> CopyToRoutineCSV::CopyToOneRow
>
> If we use this approach, we can't remove the following
> branch in compile time:
>
> + if (is_csv)
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1]);
> + else
> + CopyAttributeOutText(cstate, string);
>
> We can remove the branch in compile time with the current
> approach (constant argument + inline).
>
> It may have a negative performance impact because the "if"
> is used many times with large data. (That's why we choose
> the constant argument + inline approach in this thread.)
>
Thank you for the explanation, I missed that fact. I'm fine with having is_csv.
The first two patches are refactoring patches (+ small performance
improvements). I've reviewed these patches again and attached the
updated patches. I reorganized the function order and updated comments
etc. I find that these patches are reasonably ready to push. Could you
review these versions? I'm going to push them, barring objections and
further comments.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v33-0002-Refactor-COPY-FROM-to-use-format-callback-functi.patch | application/octet-stream | 32.5 KB |
v33-0001-Refactor-COPY-TO-to-use-format-callback-function.patch | application/octet-stream | 18.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2025-02-25 22:13:09 | Re: Trigger more frequent autovacuums of heavy insert tables |
Previous Message | Tom Lane | 2025-02-25 21:52:08 | Re: pgbench client-side performance issue on large scripts |