From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Shave a few cycles off our ilog10 implementation |
Date: | 2024-12-19 03:30:37 |
Message-ID: | CAApHDvqQT98yVqXsxY40=rp_XYv4dhzFn85s6yrVwwi46c+TWg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 18 Dec 2024 at 23:42, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> The difference is small enough that normally I'd say it's likely
> unrelated to the patch, but on the other hand it's consistent with
> saving (3 * 10 * 10 million) cycles because of 1 less multiplication
> each, which is not nothing, but for shoving bytes into /dev/null it's
> not exciting either. The lookup for the 64-bit case has grown to 1024
> bytes, which will compete for cache space. I don't have a strong
> reason to be either for or against this patch. Anyone else want to
> test?
I tried it out too on my Zen4 machine. I don't doubt David saw a
speedup when testing the performance in isolation, but I can't detect
anything going faster when using it in Postgres.
Maybe we can revisit if we make COPY TO faster someday. As of today,
it's a pretty inefficient lump of code.
My results:
$ echo master && ./intbench.sh
master
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 246.294 ms
latency average = 243.167 ms
latency average = 245.620 ms
latency average = 247.135 ms
latency average = 248.206 ms
latency average = 253.433 ms
latency average = 259.296 ms
latency average = 248.856 ms
latency average = 247.518 ms
latency average = 259.581 ms
latency average = 244.426 ms
latency average = 244.553 ms
latency average = 249.909 ms
latency average = 244.079 ms
latency average = 246.422 ms
latency average = 248.763 ms
latency average = 247.318 ms
latency average = 249.675 ms
latency average = 245.192 ms
latency average = 253.975 ms
$ echo patched && ./intbench.sh
patched
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 253.964 ms
latency average = 257.463 ms
latency average = 250.506 ms
latency average = 252.401 ms
latency average = 260.806 ms
latency average = 250.120 ms
latency average = 251.539 ms
latency average = 262.180 ms
latency average = 252.349 ms
latency average = 251.332 ms
latency average = 249.490 ms
latency average = 252.696 ms
latency average = 251.895 ms
latency average = 248.466 ms
latency average = 255.839 ms
latency average = 253.334 ms
latency average = 250.548 ms
latency average = 288.164 ms
latency average = 252.587 ms
latency average = 256.059 ms
perf top:
master:
16.59% postgres [.] CopyAttributeOutText
15.63% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.94% postgres [.] pg_ltoa
9.85% postgres [.] CopyOneRowTo
6.86% postgres [.] AllocSetAlloc
6.73% postgres [.] tts_buffer_heap_getsomeattrs
patched
19.53% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.52% postgres [.] pg_ltoa
11.76% postgres [.] CopyAttributeOutText
11.40% postgres [.] CopyOneRowTo
6.96% postgres [.] tts_buffer_heap_getsomeattrs
6.35% postgres [.] AllocSetAlloc
I can't think of what we have that exercises pg_ltoa() or pg_ultoa_n()
more. timestamp_out() might, but that's lots of small ints.
David
Attachment | Content-Type | Size |
---|---|---|
intbench.sh.txt | text/plain | 429 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2024-12-19 04:13:25 | Re: log_min_messages per backend type |
Previous Message | John Naylor | 2024-12-19 03:21:39 | Re: Change GUC hashtable to use simplehash? |