Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-02-07 13:01:17
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


In <CAD21AoBkDE4JwjPgcLxSEwqu3nN4VXjkYS9vpRQDwA2GwNQCsg(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> I was just looking at bit at this series of patch labelled with v31,
>> to see what is happening here.
>> In 0001, we have that:
>> + /* format-specific routines */
>> + const CopyToRoutine *routine;
>> [...]
>> - CopySendEndOfRow(cstate);
>> + cstate->routine->CopyToOneRow(cstate, slot);
>> Having a callback where the copy state is processed once per row is
>> neat in terms of design for the callbacks and what extensions can do,
>> and this is much better than what 2889fd23be5 has attempted (later
>> reverted in 1aa8324b81fa) because we don't do indirect function calls
>> for each attribute. Still, I have a question here: what happens for a
>> COPY TO that involves one attribute, a short field size like an int2
>> and many rows (the more rows the more pronounced the effect, of
>> course)? Could this level of indirection still be the cause of some
>> regressions in a case like that? This is the worst case I can think
>> about, on top of my mind, and I am not seeing tests with few
>> attributes like this one, where we would try to make this callback as
>> hot as possible. This is a performance-sensitive area.
> FYI when Sutou-san last measured the performance[1], it showed a
> slight speed up even with fewer columns (5 columns) in both COPY TO
> and COPY FROM cases. The callback design has not changed since then.
> But it would be a good idea to run the benchmark with a table having a
> single small size column.
> [1]

I measured v31 patch set with 1,6,11,16,21,26,31 int2
columns. See the attached PDF for 0001 and 0002 result.

0001 - to:

It's faster than master when the number of rows are

It's almost same as master when the number of rows are

There is no significant slow down when the number of columns
is 1.

0001 - from:

0001 doesn't change COPY FROM code. So the differences are
not real difference.

0002 - to:

0002 doesn't change COPY TO code. So "0001 - to" and "0002 -
to" must be the same result. But 0002 is faster than master
for all cases. It shows that the CopyToOneRow() approach
doesn't have significant slow down.

0002 - from:

0002 changes COPY FROM code. So this may have performance

It's almost same as master when data is smaller
((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16

It's faster than master when data is larger.

There is no significant slow down by 0002.


Attachment Content-Type Size
v31-intel-core-i7-3770-result-1-2.pdf application/pdf 53.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2025-02-07 13:05:33 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Peter Eisentraut 2025-02-07 12:36:28 Re: Virtual generated columns