Quick Links

Re: COPY performance on Windows

From:	Vladlen Popolitov <v(dot)popolitov(at)postgrespro(dot)ru>
To:	"Ryohei Takahashi (Fujitsu)" <r(dot)takahashi_2(at)fujitsu(dot)com>
Cc:	'Robert Haas' <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: COPY performance on Windows
Date:	2024-12-18 10:41:37
Message-ID:	c0a91623b39cd57dc8c3c0e20180ff54@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Ryohei Takahashi (Fujitsu) писал(а) 2024-12-16 15:10:
Hi
> Please use the "test.sh" in the following e-mail.
> https://www.postgresql.org/message-id/flat/TY3PR01MB11891C0FD066F069B113A2376823E2%40TY3PR01MB11891.jpnprd01.prod.outlook.com#8455c9f7b66780a356511f5cfe029d57
I cannot reproduce your results. In all of my runs final result depends
on run order -
benchmark for first versin get higher time, than time is smaller,
f.e. my last run (in start time order, time is in seconds):
PG164: nclients = 1, time = 251
PG164: nclients = 2, time = 210
PG164: nclients = 4, time = 126
PG164: nclients = 8, time = 107
PG164: nclients = 16, time = 99
PG164: nclients = 32, time = 109
PG164: nclients = 64, time = 112
PG164: nclients = 128, time = 113
PG164: nclients = 256, time = 120
PG166: nclients = 1, time = 244
PG166: nclients = 2, time = 222
PG166: nclients = 4, time = 131
PG166: nclients = 8, time = 109
PG166: nclients = 16, time = 101
PG166: nclients = 32, time = 110
PG166: nclients = 64, time = 115
PG166: nclients = 128, time = 116
PG166: nclients = 256, time = 123
PG170: nclients = 1, time = 240
PG170: nclients = 2, time = 213
PG170: nclients = 4, time = 129
PG170: nclients = 8, time = 110
PG170: nclients = 16, time = 101
PG170: nclients = 32, time = 112
PG170: nclients = 64, time = 115
PG170: nclients = 128, time = 116
PG170: nclients = 256, time = 122

I slightly modified your script:
1) exclude creation of input files to the separate step to decrease
influence of system disk cache.
2) run PostgreSQL servers on separate PC (Windows 10, 11th Gen Intel(R)
Core(TM) i5-1135G7 @ 2.40GHz , RAM 16GB),
clients on separate PC
3) I added CHECKPOINT in the end of every COPY FROM to flush wal.
4) I used EDB build for Windows from their site. Unfortunatelly, they
distribute
files without debug symbols like other distributions, it does not help
during profiling.
5) I think, that better to decrease shared_buffers as small as possible
to measure all IO time,
but I used 25% of RAM.

My observations
1) for 1-2 clients read time decreases every run (independent on
Postgres version) -
looks like Windows disk cache (I think, HTFS system information like
btree of file locations,
not the input file itself) - it contradicts to your main point, that
17.0 version is slower.

2) 1-client - Postgres backend takes only 12% of CPU, the rest time it
waits kernel operations.

3) 16-256 clients - I have not made any analysis of multiprocessor
effect to time increase:
OS process implementation, waiting on PostgreSQL locks or spinlocks,
parallel access to one
input file or other factors.

Could you confirm, that you receive you results on all execution orders
(17.0 first and 17.0 last)?

--
Best regards,

Vladlen Popolitov.

In response to

RE: COPY performance on Windows at 2024-12-16 12:10:03 from Ryohei Takahashi (Fujitsu)

Responses

RE: COPY performance on Windows at 2024-12-19 13:13:27 from Ryohei Takahashi (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2024-12-18 10:42:07	Re: Shave a few cycles off our ilog10 implementation
Previous Message	Heikki Linnakangas	2024-12-18 10:21:35	Re: POC: make mxidoff 64 bits