Re: Shave a few cycles off our ilog10 implementation

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shave a few cycles off our ilog10 implementation
Date: 2024-12-19 03:30:37
Message-ID: CAApHDvqQT98yVqXsxY40=rp_XYv4dhzFn85s6yrVwwi46c+TWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Dec 2024 at 23:42, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> The difference is small enough that normally I'd say it's likely
> unrelated to the patch, but on the other hand it's consistent with
> saving (3 * 10 * 10 million) cycles because of 1 less multiplication
> each, which is not nothing, but for shoving bytes into /dev/null it's
> not exciting either. The lookup for the 64-bit case has grown to 1024
> bytes, which will compete for cache space. I don't have a strong
> reason to be either for or against this patch. Anyone else want to
> test?

I tried it out too on my Zen4 machine. I don't doubt David saw a
speedup when testing the performance in isolation, but I can't detect
anything going faster when using it in Postgres.

Maybe we can revisit if we make COPY TO faster someday. As of today,
it's a pretty inefficient lump of code.

My results:

$ echo master && ./intbench.sh
master
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 246.294 ms
latency average = 243.167 ms
latency average = 245.620 ms
latency average = 247.135 ms
latency average = 248.206 ms
latency average = 253.433 ms
latency average = 259.296 ms
latency average = 248.856 ms
latency average = 247.518 ms
latency average = 259.581 ms
latency average = 244.426 ms
latency average = 244.553 ms
latency average = 249.909 ms
latency average = 244.079 ms
latency average = 246.422 ms
latency average = 248.763 ms
latency average = 247.318 ms
latency average = 249.675 ms
latency average = 245.192 ms
latency average = 253.975 ms

$ echo patched && ./intbench.sh
patched
NOTICE: relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 253.964 ms
latency average = 257.463 ms
latency average = 250.506 ms
latency average = 252.401 ms
latency average = 260.806 ms
latency average = 250.120 ms
latency average = 251.539 ms
latency average = 262.180 ms
latency average = 252.349 ms
latency average = 251.332 ms
latency average = 249.490 ms
latency average = 252.696 ms
latency average = 251.895 ms
latency average = 248.466 ms
latency average = 255.839 ms
latency average = 253.334 ms
latency average = 250.548 ms
latency average = 288.164 ms
latency average = 252.587 ms
latency average = 256.059 ms

perf top:

master:
16.59% postgres [.] CopyAttributeOutText
15.63% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.94% postgres [.] pg_ltoa
9.85% postgres [.] CopyOneRowTo
6.86% postgres [.] AllocSetAlloc
6.73% postgres [.] tts_buffer_heap_getsomeattrs

patched
19.53% libc.so.6 [.] __memmove_avx512_unaligned_erms
12.52% postgres [.] pg_ltoa
11.76% postgres [.] CopyAttributeOutText
11.40% postgres [.] CopyOneRowTo
6.96% postgres [.] tts_buffer_heap_getsomeattrs
6.35% postgres [.] AllocSetAlloc

I can't think of what we have that exercises pg_ltoa() or pg_ultoa_n()
more. timestamp_out() might, but that's lots of small ints.

David

Attachment Content-Type Size
intbench.sh.txt text/plain 429 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-12-19 04:13:25 Re: log_min_messages per backend type
Previous Message John Naylor 2024-12-19 03:21:39 Re: Change GUC hashtable to use simplehash?