Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: tomas(dot)vondra(at)enterprisedb(dot)com
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-07-30 07:13:06
Message-ID: 20240730.161306.1949365750142390208.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <26541788-8853-4d93-86cd-5f701b13ae51(at)enterprisedb(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jul 2024 14:17:08 +0200,
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

> I wrote a simple script to automate the benchmark - it just runs these
> tests with different parameters (number of columns and number of
> imported/exported rows). See the run.sh attachment, along with two CSV
> results from current master and with all patches applied.

Thanks. I also used the script with some modifications:

1. Create a test database automatically
2. Enable blackhole_am automatically
3. Create create_table_cols() automatically

I attach it. I also attach results of master and patched. My
results are from my desktop. So it's probably noisy.

> - For COPY FROM there is no difference - the results are within 1% of
> master, and there's no systemic difference.
>
> - For COPY TO it's a different story, though. There's a pretty clear
> regression, by ~5%. It's a bit interesting the correlation with the
> number of columns is not stronger ...

My results showed different trend:

- COPY FROM: Patched is about 15-20% slower than master
- COPY TO: Patched is a bit faster than master

Here are some my numbers:

type n_cols n_rows diff master patched
----------------------------------------------------------
TO 5 1 100.56% 218.376000 219.609000
FROM 5 1 113.33% 168.493000 190.954000
...
TO 5 5 100.60% 1037.773000 1044.045000
FROM 5 5 116.46% 767.966000 894.377000
...
TO 5 10 100.15% 2092.245000 2095.472000
FROM 5 10 115.91% 1508.160000 1748.130000
TO 10 1 98.62% 353.087000 348.214000
FROM 10 1 118.65% 260.551000 309.133000
...
TO 10 5 96.89% 1724.061000 1670.427000
FROM 10 5 119.92% 1224.098000 1467.941000
...
TO 10 10 98.70% 3444.291000 3399.538000
FROM 10 10 118.79% 2462.314000 2924.866000
TO 15 1 97.71% 492.082000 480.802000
FROM 15 1 115.59% 347.820000 402.033000
...
TO 15 5 98.32% 2402.419000 2362.140000
FROM 15 5 115.48% 1657.594000 1914.245000
...
TO 15 10 96.91% 4830.319000 4681.145000
FROM 15 10 115.09% 3304.798000 3803.542000
TO 20 1 96.05% 629.828000 604.939000
FROM 20 1 118.50% 438.673000 519.839000
...
TO 20 5 97.15% 3084.210000 2996.331000
FROM 20 5 115.35% 2110.909000 2435.032000
...
TO 25 1 98.29% 764.779000 751.684000
FROM 25 1 115.13% 519.686000 598.301000
...
TO 25 5 94.08% 3843.996000 3616.614000
FROM 25 5 115.62% 2554.008000 2952.928000
...
TO 25 10 97.41% 7504.865000 7310.549000
FROM 25 10 117.25% 4994.463000 5856.029000
TO 30 1 94.39% 906.324000 855.503000
FROM 30 1 119.60% 604.110000 722.491000
...
TO 30 5 96.50% 4419.907000 4265.417000
FROM 30 5 116.97% 2932.883000 3430.556000
...
TO 30 10 94.39% 8974.878000 8470.991000
FROM 30 10 117.84% 5800.793000 6835.900000
----

See the attached diff.txt for full numbers.
I also attach scripts to generate the diff.txt. Here is the
command line I used:

----
ruby diff.rb <(ruby aggregate.rb master.result) <(ruby aggregate.rb patched.result) | tee diff.txt
----

My environment:

* Debian GNU/Linux sid
* gcc (Debian 13.3.0-2) 13.3.0
* AMD Ryzen 9 3900X 12-Core Processor

I'll look into this.

If someone is interested in this proposal, could you share
your numbers?

> It's interesting the main change in the flamegraphs is CopyToStateFlush
> pops up on the left side. Because, what is that about? That is a thing
> introduced in the 0005 patch, so maybe the regression is not strictly
> about the existing formats moving to the new API, but due to something
> else in a later version of the patch?

Ah, making static CopySendEndOfRow() a to non-static function
(CopyToStateFlush()) may be the reason of this. Could you
try the attached v19 patch? It changes the 0005 patch:

* It reverts the static change
* It adds a new non-static function that just exports
CopySendEndOfRow()

Thanks,
--
kou

Attachment Content-Type Size
unknown_filename text/plain 1.6 KB
unknown_filename text/plain 32.6 KB
unknown_filename text/plain 32.6 KB
unknown_filename text/plain 4.7 KB
unknown_filename text/plain 673 bytes
unknown_filename text/plain 408 bytes
v19-0001-Add-CopyFromRoutine-CopyToRountine.patch text/x-patch 10.4 KB
v19-0002-Use-CopyFromRoutine-CopyToRountine-for-the-exist.patch text/x-patch 43.7 KB
v19-0003-Add-support-for-adding-custom-COPY-TO-format.patch text/x-patch 20.2 KB
v19-0004-Export-CopyToStateData-and-CopyFromStateData.patch text/x-patch 31.1 KB
v19-0005-Add-support-for-implementing-custom-COPY-TO-FROM.patch text/x-patch 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthonin Bonnefoy 2024-07-30 07:21:25 Re: Use pgBufferUsage for block reporting in analyze
Previous Message Zhang Mingli 2024-07-30 07:07:10 Re: COPY FROM crash