Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-10-07 22:23:08
Message-ID: CAD21AoD67TAO6KkBecKBsLgR1tgYJS1AwiN9NQJSLE0WYw8pDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 28, 2024 at 8:56 AM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoCwMmwLJ8PQLnZu0MbB4gDJiMvWrHREQD4xRp3-F2RU2Q(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 27 Sep 2024 16:33:13 -0700,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >> * 0005 (that add "void *opaque" to Copy{From,To}StateData)
> >> has a bit negative impact for FROM and a bit positive
> >> impact for TO
> >> * But I don't know why. This doesn't change per row
> >> related codes. Increasing Copy{From,To}StateData size
> >> ("void *opaque" is added) may be related.
> >
> > I was surprised that the 0005 patch made COPY FROM slower (with fewer
> > rows) and COPY TO faster overall in spite of just adding one struct
> > field and some functions.
>
> Me too...
>
> > I'm interested in why the performance trends of COPY FROM are
> > different between fewer than 6M rows and more than 6M rows.
>
> My hypothesis:
>
> With this patch set:
> 1. One row processing is faster than master.
> 2. Non row related processing is slower than master.
>
> If we have many rows, 1. impact is greater than 2. impact.
>
>
> > Separating the patches into two parts (one is for COPY TO and another
> > one is for COPY FROM) could be a good idea. It would help reviews and
> > investigate performance regression in COPY FROM cases. And I think we
> > can commit them separately.
> >
> > Also, could you please rebase the patches as they conflict with the
> > current HEAD?
>
> OK. I've prepared 2 patch sets:
>
> v20: It just rebased on master. It still mixes COPY TO and
> COPY FROM implementations.
>
> v21: It's based on v20 but splits COPY TO implementations
> and COPY FROM implementations.
> 0001-0005 includes only COPY TO related changes.
> 0006-0010 includes only COPY FROM related changes.
>
> (v21 0001 + 0006) == (v20 v0001),
> (v21 0002 + 0007) == (v20 v0002) and so on.
>
> > I'll run some benchmarks on my environment as well.
>

Thank you for updating the patches!

I've run the same benchmark script on my various machines (Mac, Linux
(with Intel CPU and Ryzen CPU) and Raspberry Pi etc). I've not
investigated the results in depth yet but let me share the results.
Please find the attached file, extensible_copy_benchmark_20241007.pdf.

In the benchmark, I've applied the v20 patch set and 'master' in the
result refers to a19f83f87966. And I disabled CPU turbo boost where
possible. Overall, v20 patch got a similar or better performance in
both COPY FROM and COPY TO compared to master except for on MacOS. I'm
not sure that changes made to master since the last benchmark run by
Tomas and Suto-san might contribute to these results. I'll try to
investigate the performance regression that happened on MacOS. I think
that other performance differences in my results seem to be within
noises and could be acceptable. Of course, it would be great if others
also could try to run benchmark tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
extensible_copy_benchmark_20241007.pdf application/pdf 53.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-10-07 22:41:42 Re: On disable_cost
Previous Message Heikki Linnakangas 2024-10-07 21:55:00 Re: Refactoring postmaster's code to cleanup after child exit