From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Dean Rasheed" <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Date: | 2024-07-02 07:49:49 |
Message-ID: | 8e909218-1965-4515-99e3-4bb5b625e004@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 2, 2024, at 00:19, Dean Rasheed wrote:
> I had a play with this, and came up with a slightly different way of
> doing it that works for var2 of any size, as long as var1 is just 1 or
> 2 digits.
>
> Repeating your benchmark where both numbers have up to 2 NBASE-digits,
> this new approach was slightly faster:
>
...
>
> (This was on an older Intel Core i9-9900K, so I'm not sure why all the
> timings are faster. What compiler settings are you using?)
Strange. I just did `./configure` with a --prefix.
Compiler settings on my Intel Core i9-14900K machine:
$ pg_config | grep -E '^(CC|CFLAGS|CPPFLAGS|LDFLAGS)'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/home/joel/pg-dev/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
> The approach taken in this patch only uses 32-bit integers, so in
> theory it could be extended to work for var1ndigits = 3, 4, or even
> more, but the code would get increasingly complex, and I ran out of
> steam at 2 digits. It might be worth trying though.
>
> Regards,
> Dean
>
> Attachments:
> * optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
Really nice!
I've benchmarked your patch on my three machines with great results.
I added a setseed() step, to make the benchmarks reproducible,
shouldn't matter much since it should statistically average out, but I thought why not.
CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);
SELECT setseed(0.12345);
INSERT INTO bench_mul_var (num1, num2)
SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM generate_series(1,1e8);
\timing
/*
* Apple M3 Max
*/
SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 3622.342 ms (00:03.622)
Time: 3029.786 ms (00:03.030)
Time: 3046.497 ms (00:03.046)
Time: 3035.910 ms (00:03.036)
Time: 3034.073 ms (00:03.034)
SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
Time: 2484.685 ms (00:02.485)
Time: 2478.341 ms (00:02.478)
Time: 2494.397 ms (00:02.494)
Time: 2470.987 ms (00:02.471)
Time: 2490.215 ms (00:02.490)
/*
* Intel Core i9-14900K
*/
SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 2555.569 ms (00:02.556)
Time: 2523.145 ms (00:02.523)
Time: 2518.671 ms (00:02.519)
Time: 2514.501 ms (00:02.515)
Time: 2516.919 ms (00:02.517)
SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
Time: 2246.441 ms (00:02.246)
Time: 2243.900 ms (00:02.244)
Time: 2245.350 ms (00:02.245)
Time: 2245.080 ms (00:02.245)
Time: 2247.856 ms (00:02.248)
/*
* AMD Ryzen 9 7950X3D
*/
SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 3037.497 ms (00:03.037)
Time: 3010.037 ms (00:03.010)
Time: 3000.956 ms (00:03.001)
Time: 2989.424 ms (00:02.989)
Time: 2984.911 ms (00:02.985)
SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
Time: 2645.530 ms (00:02.646)
Time: 2640.472 ms (00:02.640)
Time: 2638.613 ms (00:02.639)
Time: 2637.889 ms (00:02.638)
Time: 2638.054 ms (00:02.638)
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2024-07-02 07:57:40 | Re: gamma() and lgamma() functions |
Previous Message | Yugo NAGATA | 2024-07-02 07:34:44 | Add has_large_object_privilege function |