From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Flushing large data immediately in pqcomm |
Date: | 2024-04-06 20:21:27 |
Message-ID: | 20240406202127.2asud6cjfq3exqew@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-04-06 14:34:17 +1300, David Rowley wrote:
> I don't see any issues with v5, so based on the performance numbers
> shown on this thread for the latest patch, it would make sense to push
> it. The problem is, I just can't recreate the performance numbers.
>
> I've tried both on my AMD 3990x machine and an Apple M2 with a script
> similar to the test.sh from above. I mostly just stripped out the
> buffer size stuff and adjusted the timing code to something that would
> work with mac.
I think there are a few issues with the test script leading to not seeing a
gain:
1) I think using the textual protocol, with the text datatype, will make it
harder to spot differences. That's a lot of overhead.
2) Afaict the test is connecting over the unix socket, I think we expect
bigger wins for tcp
3) Particularly the larger string is bottlenecked due to pglz compression in
toast.
Where I had noticed the overhead of the current approach badly, was streaming
out basebackups. Which is all binary, of course.
I added WITH BINARY, SET STORAGE EXTERNAL and tested both unix socket and
localhost. I also reduced row counts and iteration counts, because I am
impatient, and I don't think it matters much here. Attached the modified
version.
On a dual xeon Gold 5215, turbo boost disabled, server pinned to one core,
script pinned to another:
unix:
master:
Run 100 100 1000000: 0.058482377
Run 1024 10240 100000: 0.120909810
Run 1024 1048576 2000: 0.153027916
Run 1048576 1048576 1000: 0.154953512
v5:
Run 100 100 1000000: 0.058760126
Run 1024 10240 100000: 0.118831396
Run 1024 1048576 2000: 0.124282503
Run 1048576 1048576 1000: 0.123894962
localhost:
master:
Run 100 100 1000000: 0.067088000
Run 1024 10240 100000: 0.170894273
Run 1024 1048576 2000: 0.230346632
Run 1048576 1048576 1000: 0.230336078
v5:
Run 100 100 1000000: 0.067144036
Run 1024 10240 100000: 0.167950948
Run 1024 1048576 2000: 0.135167027
Run 1048576 1048576 1000: 0.135347867
The perf difference for 1MB via TCP is really impressive.
The small regression for small results is still kinda visible, I haven't yet
tested the patch downthread.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
test1a.sh.txt | text/plain | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Wienhold | 2024-04-06 21:14:23 | Re: CASE control block broken by a single line comment |
Previous Message | Nathan Bossart | 2024-04-06 19:41:01 | Re: Popcount optimization using AVX512 |