From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-02-07 13:01:17 |
Message-ID: | 20250207.220117.2285437283940199074.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoBkDE4JwjPgcLxSEwqu3nN4VXjkYS9vpRQDwA2GwNQCsg(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> I was just looking at bit at this series of patch labelled with v31,
>> to see what is happening here.
>>
>> In 0001, we have that:
>>
>> + /* format-specific routines */
>> + const CopyToRoutine *routine;
>> [...]
>> - CopySendEndOfRow(cstate);
>> + cstate->routine->CopyToOneRow(cstate, slot);
>>
>> Having a callback where the copy state is processed once per row is
>> neat in terms of design for the callbacks and what extensions can do,
>> and this is much better than what 2889fd23be5 has attempted (later
>> reverted in 1aa8324b81fa) because we don't do indirect function calls
>> for each attribute. Still, I have a question here: what happens for a
>> COPY TO that involves one attribute, a short field size like an int2
>> and many rows (the more rows the more pronounced the effect, of
>> course)? Could this level of indirection still be the cause of some
>> regressions in a case like that? This is the worst case I can think
>> about, on top of my mind, and I am not seeing tests with few
>> attributes like this one, where we would try to make this callback as
>> hot as possible. This is a performance-sensitive area.
>
> FYI when Sutou-san last measured the performance[1], it showed a
> slight speed up even with fewer columns (5 columns) in both COPY TO
> and COPY FROM cases. The callback design has not changed since then.
> But it would be a good idea to run the benchmark with a table having a
> single small size column.
>
> [1] https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com
I measured v31 patch set with 1,6,11,16,21,26,31 int2
columns. See the attached PDF for 0001 and 0002 result.
0001 - to:
It's faster than master when the number of rows are
1,000,000-5,000,000.
It's almost same as master when the number of rows are
6,000,000-10,000,000.
There is no significant slow down when the number of columns
is 1.
0001 - from:
0001 doesn't change COPY FROM code. So the differences are
not real difference.
0002 - to:
0002 doesn't change COPY TO code. So "0001 - to" and "0002 -
to" must be the same result. But 0002 is faster than master
for all cases. It shows that the CopyToOneRow() approach
doesn't have significant slow down.
0002 - from:
0002 changes COPY FROM code. So this may have performance
impact.
It's almost same as master when data is smaller
((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16
columns)).
It's faster than master when data is larger.
There is no significant slow down by 0002.
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
v31-intel-core-i7-3770-result-1-2.pdf | application/pdf | 53.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nisha Moond | 2025-02-07 13:05:33 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Peter Eisentraut | 2025-02-07 12:36:28 | Re: Virtual generated columns |