| From: | Sutou Kouhei <kou(at)clear-code(dot)com> | 
|---|---|
| To: | andrew(at)dunslane(dot)net | 
| Cc: | michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations | 
| Date: | 2024-01-24 14:17:26 | 
| Message-ID: | 20240124.231726.1771099323950062661.kou@clear-code.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
In <10025bac-158c-ffe7-fbec-32b42629121f(at)dunslane(dot)net>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 24 Jan 2024 07:15:55 -0500,
  Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> 
> On 2024-01-24 We 03:11, Michael Paquier wrote:
>> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
>>> For COPY TO:
>>>
>>> 0001: This adds CopyToRoutine and use it for text/csv/binary
>>> formats. No implementation change. This just move codes.
>> 10M without this change:
>>
>>      format,elapsed time (ms)
>>      text,1090.763
>>      csv,1136.103
>>      binary,1137.141
>>
>> 10M with this change:
>>
>>      format,elapsed time (ms)
>>      text,1082.654
>>      csv,1196.991
>>      binary,1069.697
>>
>> These numbers point out that binary is faster by 6%, csv is slower by
>> 5%, while text stays around what looks like noise range.  That's not
>> negligible.  Are these numbers reproducible?  If they are, that could
>> be a problem for anybody doing bulk-loading of large data sets.  I am
>> not sure to understand where the improvement for binary comes from by
>> reading the patch, but perhaps perf would tell more for each format?
>> The loss with csv could be blamed on the extra manipulations of the
>> function pointers, likely.
> 
> 
> I don't think that's at all acceptable.
> 
> We've spent quite a lot of blood sweat and tears over the years to make COPY
> fast, and we should not sacrifice any of that lightly.
These numbers aren't reproducible. Because these benchmarks
executed on my normal machine not a machine only for
benchmarking. The machine runs another processes such as
editor and Web browser.
For example, here are some results with master
(94edfe250c6a200d2067b0debfe00b4122e9b11e):
Format,N records,Elapsed time (ms)
csv,10000000,1073.715
csv,10000000,1022.830
csv,10000000,1073.584
csv,10000000,1090.651
csv,10000000,1052.259
Here are some results with master + the 0001 patch:
Format,N records,Elapsed time (ms)
csv,10000000,1025.356
csv,10000000,1067.202
csv,10000000,1014.563
csv,10000000,1032.088
csv,10000000,1058.110
I uploaded my benchmark script so that you can run the same
benchmark on your machine:
https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5
Could anyone try the benchmark with master and master+0001?
Thanks,
-- 
kou
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sutou Kouhei | 2024-01-24 14:20:22 | Re: Make COPY format extendable: Extract COPY TO format implementations | 
| Previous Message | Alvaro Herrera | 2024-01-24 14:09:49 | Re: make BuiltinTrancheNames less ugly |