From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
Cc: | michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-02-20 23:28:26 |
Message-ID: | CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 7, 2025 at 5:01 AM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoBkDE4JwjPgcLxSEwqu3nN4VXjkYS9vpRQDwA2GwNQCsg(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >> I was just looking at bit at this series of patch labelled with v31,
> >> to see what is happening here.
> >>
> >> In 0001, we have that:
> >>
> >> + /* format-specific routines */
> >> + const CopyToRoutine *routine;
> >> [...]
> >> - CopySendEndOfRow(cstate);
> >> + cstate->routine->CopyToOneRow(cstate, slot);
> >>
> >> Having a callback where the copy state is processed once per row is
> >> neat in terms of design for the callbacks and what extensions can do,
> >> and this is much better than what 2889fd23be5 has attempted (later
> >> reverted in 1aa8324b81fa) because we don't do indirect function calls
> >> for each attribute. Still, I have a question here: what happens for a
> >> COPY TO that involves one attribute, a short field size like an int2
> >> and many rows (the more rows the more pronounced the effect, of
> >> course)? Could this level of indirection still be the cause of some
> >> regressions in a case like that? This is the worst case I can think
> >> about, on top of my mind, and I am not seeing tests with few
> >> attributes like this one, where we would try to make this callback as
> >> hot as possible. This is a performance-sensitive area.
> >
> > FYI when Sutou-san last measured the performance[1], it showed a
> > slight speed up even with fewer columns (5 columns) in both COPY TO
> > and COPY FROM cases. The callback design has not changed since then.
> > But it would be a good idea to run the benchmark with a table having a
> > single small size column.
> >
> > [1] https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com
>
> I measured v31 patch set with 1,6,11,16,21,26,31 int2
> columns. See the attached PDF for 0001 and 0002 result.
>
> 0001 - to:
>
> It's faster than master when the number of rows are
> 1,000,000-5,000,000.
>
> It's almost same as master when the number of rows are
> 6,000,000-10,000,000.
>
> There is no significant slow down when the number of columns
> is 1.
>
> 0001 - from:
>
> 0001 doesn't change COPY FROM code. So the differences are
> not real difference.
>
> 0002 - to:
>
> 0002 doesn't change COPY TO code. So "0001 - to" and "0002 -
> to" must be the same result. But 0002 is faster than master
> for all cases. It shows that the CopyToOneRow() approach
> doesn't have significant slow down.
>
> 0002 - from:
>
> 0002 changes COPY FROM code. So this may have performance
> impact.
>
> It's almost same as master when data is smaller
> ((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16
> columns)).
>
> It's faster than master when data is larger.
>
> There is no significant slow down by 0002.
>
Thank you for sharing the benchmark results. That looks good to me.
Looking at the 0001 patch again, I have a question: we have
CopyToTextLikeOneRow() for both CSV and text format:
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ CopyToTextLikeOneRow(cstate, slot, true);
+}
These two functions pass different is_csv value to that function,
which is used as follows:
+ if (is_csv)
+ CopyAttributeOutCSV(cstate, string,
+
cstate->opts.force_quote_flags[attnum - 1]);
+ else
+ CopyAttributeOutText(cstate, string);
However, we can know whether the format is CSV or text by checking
cstate->opts.csv_mode instead of passing is_csv. That way, we can
directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
CopyToTextOneRow(). It would not help performance since we already
inline CopyToTextLikeOneRow(), but it looks simpler.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-02-20 23:31:49 | Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation |
Previous Message | Sami Imseih | 2025-02-20 23:03:19 | Re: Proposal - Allow extensions to set a Plan Identifier |