Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Date: 2024-07-01 13:11:43
Message-ID: 9ff90ff9-5ac1-4529-b2f4-f93b37734b9c@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2024, at 14:25, Dagfinn Ilmari Mannsåker wrote:
> div_var() also has an optimisation for 3- and 4-digit operands under
> HAVE_INT128 (added in commit 0aa38db56bf), would that make sense in
> mul_var() too?

I considered it, but it only gives a marginal speed-up on Intel Core i9-14900K,
and is actually slower on Apple M3 Max.
Not really sure why. Maybe the code I tried can be optimized further:

```
#ifdef HAVE_INT128
/*
* If var1 and var2 are up to four digits, their product will fit in
* an int128 can be computed directly, which is significantly faster.
*/
if (var2ndigits <= 4)
{
int128 product = 0;

switch (var1ndigits)
{
case 1:
product = var1digits[0];
break;
case 2:
product = var1digits[0] * NBASE + var1digits[1];
break;
case 3:
product = ((int128) var1digits[0] * NBASE + var1digits[1])
* NBASE + var1digits[2];
break;
case 4:
product = (((int128) var1digits[0] * NBASE + var1digits[1])
* NBASE + var1digits[2]) * NBASE + var1digits[3];
break;
}

switch (var2ndigits)
{
case 1:
product *= var2digits[0];
break;
case 2:
product *= var2digits[0] * NBASE + var2digits[1];
break;
case 3:
product = ((int128) var2digits[0] * NBASE + var2digits[1])
* NBASE + var2digits[2];
break;
case 4:
product = (((int128) var2digits[0] * NBASE + var2digits[1])
* NBASE + var2digits[2]) * NBASE + var2digits[3];
break;
}

alloc_var(result, res_ndigits);
res_digits = result->digits;
for (i = res_ndigits - 1; i >= 0; i--)
{
res_digits[i] = product % NBASE;
product /= NBASE;
}
Assert(product == 0);

/*
* Finally, round the result to the requested precision.
*/
result->weight = res_weight;
result->sign = res_sign;

/* Round to target rscale (and set result->dscale) */
round_var(result, rscale);

/* Strip leading and trailing zeroes */
strip_var(result);

return;
}
#endif
```

Benchmark 1, testing 2 ndigits * 2 ndigits:

SELECT
timeit.pretty_time(total_time_a / 1e6 / executions,3) AS execution_time_a,
timeit.pretty_time(total_time_b / 1e6 / executions,3) AS execution_time_b,
total_time_a::numeric/total_time_b AS performance_ratio
FROM timeit.cmp(
'numeric_mul',
'numeric_mul_patched',
input_values := ARRAY[
'11112222',
'33334444'
],
min_time := 1000000,
timeout := '10 s'
);

Apple M3 Max:

execution_time_a | execution_time_b | performance_ratio
------------------+------------------+--------------------
32.2 ns | 20.5 ns | 1.5700112246809388
(1 row)

Intel Core i9-14900K:

execution_time_a | execution_time_b | performance_ratio
------------------+------------------+--------------------
30.2 ns | 21.4 ns | 1.4113042510107371
(1 row)

So 57% and 41% faster.

Benchmark 2, testing 4 ndigits * 4 ndigits:

SELECT
timeit.pretty_time(total_time_a / 1e6 / executions,3) AS execution_time_a,
timeit.pretty_time(total_time_b / 1e6 / executions,3) AS execution_time_b,
total_time_a::numeric/total_time_b AS performance_ratio
FROM timeit.cmp(
'numeric_mul',
'numeric_mul_patched',
input_values := ARRAY[
'1111222233334444',
'5555666677778888'
],
min_time := 1000000,
timeout := '10 s'
);

Apple M3 Max:

execution_time_a | execution_time_b | performance_ratio
------------------+------------------+------------------------
41.9 ns | 51.3 ns | 0.81733655797170943614
(1 row)

Intel Core i9-14900K:

execution_time_a | execution_time_b | performance_ratio
------------------+------------------+--------------------
40 ns | 38 ns | 1.0515610914706320
(1 row)

So 18% slower on Apple M3 Max and just 5% faster on Intel Core i9-14900K.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2024-07-01 13:13:23 Re: PATCH: Add query for operators unusable with RLS to documentation
Previous Message Yugo NAGATA 2024-07-01 13:08:17 psql: Add leakproof field to \dAo+ meta-command results