Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Dean Rasheed" <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Date: 2024-07-09 14:11:22
Message-ID: 252c8b54-3962-4d0a-8479-43f429bd7015@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 9, 2024, at 14:01, Dean Rasheed wrote:
> Before considering the other patches to optimise for larger inputs, I
> think it's worth optimising the existing mul_var() code as much as
> possible.
>
> One thing I noticed while testing the earlier patches on this thread
> was that they were significantly faster if they used unsigned integers
> rather than signed integers. I think the reason is that operations
> like "x / 10000" and "x % 10000" use fewer CPU instructions (on every
> platform, according to godbolt.org) if x is unsigned.
>
> In addition, this reduces the number of times the digit array needs to
> be renormalised, which seems to be the biggest factor.
>
> Another small optimisation that seems to be just about worthwhile is
> to pull the first digit of var1 out of the main loop, so that its
> contributions can be set directly in dig[], rather than being added to
> it. This allows palloc() to be used to allocate dig[], rather than
> palloc0(), and only requires the upper part of dig[] to be initialised
> to zeros, rather than all of it.

Nice, really smart!

> Together, these seem to give a decent speed-up:
>
> NBASE digits | HEAD rate | patch rate
> --------------+---------------+---------------
> 5 | 5.8797105e+06 | 6.047134e+06
> 12 | 4.140017e+06 | 4.3429845e+06
> 25 | 2.5711072e+06 | 2.7530615e+06
> 50 | 1.0367389e+06 | 1.3370771e+06
> 100 | 367924.1 | 462437.38
> 250 | 77231.32 | 104593.95
> 2500 | 881.48694 | 1197.4739
> 15000 | 25.064987 | 32.78391
>
> The largest gains are above around 50 NBASE digits, where the time
> spent renormalising dig[] becomes significant.

I added some more ndigits test cases:

/*
* Intel Core i9-14900K
*/

NBASE digits | HEAD rate | patch rate | relative difference
--------------+----------------+----------------+---------------------
1 | 4.7846890e+07 | 4.7846890e+07 | 0.00%
2 | 4.9504950e+07 | 4.7393365e+07 | -4.27%
3 | 4.0816327e+07 | 4.0983607e+07 | 0.41%
4 | 4.1152263e+07 | 3.9370079e+07 | -4.33%
5 | 2.2573363e+07 | 2.1978022e+07 | -2.64%
6 | 2.1739130e+07 | 1.9646365e+07 | -9.63%
7 | 1.6393443e+07 | 1.6339869e+07 | -0.33%
8 | 1.6863406e+07 | 1.6778523e+07 | -0.50%
9 | 1.5105740e+07 | 1.6420361e+07 | 8.70%
10 | 1.3315579e+07 | 1.5527950e+07 | 16.61%
11 | 1.2360939e+07 | 1.4124294e+07 | 14.27%
12 | 1.1764706e+07 | 1.2836970e+07 | 9.11%
13 | 1.0060362e+07 | 1.1820331e+07 | 17.49%
14 | 9.0909091e+06 | 1.0000000e+07 | 10.00%
15 | 7.6923077e+06 | 8.0000000e+06 | 4.00%
16 | 9.0909091e+06 | 9.4339623e+06 | 3.77%
17 | 7.2992701e+06 | 9.0909091e+06 | 24.55%
18 | 7.0921986e+06 | 7.8125000e+06 | 10.16%
19 | 6.5789474e+06 | 6.6666667e+06 | 1.33%
20 | 6.2500000e+06 | 6.5789474e+06 | 5.26%
21 | 5.8479532e+06 | 6.1728395e+06 | 5.56%
22 | 5.5555556e+06 | 5.9880240e+06 | 7.78%
24 | 5.2631579e+06 | 5.8823529e+06 | 11.76%
25 | 5.2083333e+06 | 5.5555556e+06 | 6.67%
26 | 4.7619048e+06 | 5.2631579e+06 | 10.53%
27 | 4.5045045e+06 | 5.2083333e+06 | 15.63%
28 | 4.4247788e+06 | 4.7619048e+06 | 7.62%
29 | 4.1666667e+06 | 4.5454545e+06 | 9.09%
30 | 4.0000000e+06 | 4.3478261e+06 | 8.70%
31 | 3.4482759e+06 | 4.0000000e+06 | 16.00%
32 | 3.9840637e+06 | 4.2016807e+06 | 5.46%
50 | 2.0964361e+06 | 2.6595745e+06 | 26.86%
100 | 666666.67 | 869565.22 | 30.43%
250 | 142653.35 | 171526.59 | 20.24%
2500 | 1642.04 | 2197.80 | 33.85%
15000 | 41.67 | 52.63 | 26.32%
(36 rows)

/*
* AMD Ryzen 9 7950X3D
*/

NBASE digits | HEAD rate | patch rate | relative difference
--------------+----------------+----------------+---------------------
1 | 3.6900369e+07 | 3.8022814e+07 | 3.04%
2 | 3.4602076e+07 | 3.5714286e+07 | 3.21%
3 | 2.8011204e+07 | 2.7777778e+07 | -0.83%
4 | 2.7932961e+07 | 2.8328612e+07 | 1.42%
5 | 1.6420361e+07 | 1.7123288e+07 | 4.28%
6 | 1.4705882e+07 | 1.5313936e+07 | 4.13%
7 | 1.3192612e+07 | 1.3888889e+07 | 5.28%
8 | 1.2121212e+07 | 1.2919897e+07 | 6.59%
9 | 1.1235955e+07 | 1.2135922e+07 | 8.01%
10 | 1.0000000e+07 | 1.1312217e+07 | 13.12%
11 | 9.0909091e+06 | 1.0000000e+07 | 10.00%
12 | 8.1967213e+06 | 8.4033613e+06 | 2.52%
13 | 7.2463768e+06 | 7.7519380e+06 | 6.98%
14 | 6.7567568e+06 | 7.1428571e+06 | 5.71%
15 | 5.5555556e+06 | 5.8823529e+06 | 5.88%
16 | 6.3291139e+06 | 5.7803468e+06 | -8.67%
17 | 5.8823529e+06 | 5.9880240e+06 | 1.80%
18 | 5.5555556e+06 | 5.7142857e+06 | 2.86%
19 | 5.2356021e+06 | 5.6179775e+06 | 7.30%
20 | 4.9019608e+06 | 5.1020408e+06 | 4.08%
21 | 4.5454545e+06 | 4.8543689e+06 | 6.80%
22 | 4.1841004e+06 | 4.5871560e+06 | 9.63%
24 | 4.4642857e+06 | 4.4052863e+06 | -1.32%
25 | 4.1666667e+06 | 4.2194093e+06 | 1.27%
26 | 4.0000000e+06 | 3.9525692e+06 | -1.19%
27 | 3.8461538e+06 | 3.8022814e+06 | -1.14%
28 | 3.9062500e+06 | 3.8759690e+06 | -0.78%
29 | 3.7878788e+06 | 3.8022814e+06 | 0.38%
30 | 3.3898305e+06 | 3.7174721e+06 | 9.67%
31 | 2.7472527e+06 | 2.8571429e+06 | 4.00%
32 | 3.0395137e+06 | 3.1446541e+06 | 3.46%
50 | 1.7094017e+06 | 2.0576132e+06 | 20.37%
100 | 518134.72 | 609756.10 | 17.68%
250 | 108577.63 | 136612.02 | 25.82%
2500 | 1264.22 | 1610.31 | 27.38%
15000 | 34.48 | 43.48 | 26.09%
(36 rows)

/*
* Apple M3 Max
*/

NBASE digits | HEAD rate | patch rate | relative difference
--------------+----------------+----------------+---------------------
1 | 4.9504950e+07 | 4.9504950e+07 | 0.00%
2 | 6.1349693e+07 | 5.9171598e+07 | -3.55%
3 | 5.2631579e+07 | 5.2083333e+07 | -1.04%
4 | 4.5248869e+07 | 4.5248869e+07 | 0.00%
5 | 2.1978022e+07 | 2.2727273e+07 | 3.41%
6 | 1.9920319e+07 | 2.0876827e+07 | 4.80%
7 | 1.7182131e+07 | 1.8018018e+07 | 4.86%
8 | 1.5822785e+07 | 1.6051364e+07 | 1.44%
9 | 1.3368984e+07 | 1.3333333e+07 | -0.27%
10 | 1.1709602e+07 | 1.1627907e+07 | -0.70%
11 | 1.0020040e+07 | 1.0526316e+07 | 5.05%
12 | 9.0909091e+06 | 9.0909091e+06 | 0.00%
13 | 8.2644628e+06 | 8.2644628e+06 | 0.00%
14 | 7.6923077e+06 | 7.6335878e+06 | -0.76%
15 | 7.1428571e+06 | 7.0921986e+06 | -0.71%
16 | 6.6225166e+06 | 6.5789474e+06 | -0.66%
17 | 5.9880240e+06 | 6.2111801e+06 | 3.73%
18 | 5.7803468e+06 | 5.5865922e+06 | -3.35%
19 | 5.2631579e+06 | 5.2356021e+06 | -0.52%
20 | 4.6296296e+06 | 4.8543689e+06 | 4.85%
21 | 4.4444444e+06 | 4.3859649e+06 | -1.32%
22 | 4.2016807e+06 | 4.0485830e+06 | -3.64%
24 | 3.7453184e+06 | 3.5714286e+06 | -4.64%
25 | 3.4843206e+06 | 3.4013605e+06 | -2.38%
26 | 3.2786885e+06 | 3.2786885e+06 | 0.00%
27 | 3.0674847e+06 | 3.1055901e+06 | 1.24%
28 | 2.8818444e+06 | 2.9069767e+06 | 0.87%
29 | 2.7322404e+06 | 2.7700831e+06 | 1.39%
30 | 2.5839793e+06 | 2.6246719e+06 | 1.57%
31 | 2.5062657e+06 | 2.4630542e+06 | -1.72%
32 | 4.5871560e+06 | 4.6082949e+06 | 0.46%
50 | 1.7513135e+06 | 1.9880716e+06 | 13.52%
100 | 714285.71 | 833333.33 | 16.67%
250 | 124223.60 | 149925.04 | 20.69%
2500 | 1440.92 | 1760.56 | 22.18%
15000 | 39.53 | 48.08 | 21.63%
(36 rows)

Regards,
Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-07-09 14:22:03 Re: jsonpath: Inconsistency of timestamp_tz() Output
Previous Message David E. Wheeler 2024-07-09 14:07:48 Re: jsonpath: Inconsistency of timestamp_tz() Output