From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Optimising numeric division |
Date: | 2024-08-23 13:49:22 |
Message-ID: | CAEZATCVHR10BPDJSANh0u2+Sg6atO3mD0G+CjKDNRMD-C8hKzQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Currently numeric.c has 2 separate functions that implement numeric
division, div_var() and div_var_fast(). Since div_var_fast() is only
approximate, it is only used in the transcendental functions, where a
slightly inaccurate result is OK. In all other cases, div_var() is
used.
Aside from the pain of having to maintain 2 separate functions,
div_var() is very slow for large inputs. In fact, it's well over an
order of magnitude slower than mul_var() for similar sized inputs. For
example:
create temp table foo(x numeric, y numeric, xy numeric);
insert into foo select x, y, x*y from
(select random(1e49999, 1e50000-1), random(1e49999, 1e50000-1)
from generate_series(1,100)) g(x,y);
select count(*) from foo where x*y != xy; -- 840 ms
select count(*) from foo where xy/x != y; -- 36364 ms
select count(*) from foo where xy/y != x; -- 36515 ms
The attached patch attempts to resolve those issues by replacing
div_var() and div_var_fast() with a single function intended to be
faster than both the originals.
The new version of div_var() has an extra "exact" boolean parameter,
so that it can operate in the faster, approximate mode when used by
the transcendental functions, but regardless of the value of this
parameter, it adopts the algorithm used by div_var_fast(), using
float-point arithmetic to approximate each quotient digit. The
difference is that, if an exact result is requested, it uses a larger
working dividend, so that it can subtract off complete multiples of
the divisor for each quotient digit (while still delaying the
propagation of carries). This then gives the remainder, which can be
used to check the quotient and correct it, if it's inaccurate.
In addition, like mul_var(), div_var() now does the computation in
base NBASE^2, using 64-bit integers for the working dividend array,
which halves the number of iterations in the outer loop and reduces
the frequency of carry-propagation passes.
In the testing I've done so far, this is up to around 20 times faster
than the old version of div_var() when "exact" is true (it computes
each of the queries above in around 1.6 seconds), and it's up to
around 2-3 times faster than div_var_fast() when "exact" is false.
In addition, I found that it's significantly faster than
div_var_int64() for 3 and 4 digit divisors, so I've removed that as
well.
Overall, this reduces numeric.c by around 300 lines, and means that
we'll only have one function to maintain.
Patch 0001 is the patch itself. 0002 is just for testing purposes --
it exposes both old functions and the new one as SQL-callable
functions with all their arguments, so they can be compared.
I'm also attaching the performance test script I used, and the output
it produced.
Something else these test results reveal is the answer to the
question: under what circumstances is the approximate computation
faster than the exact one? The answer seems to be: whenever var2 has
more than around 12 NBASE digits, regardless of the size of var1.
Thinking about that, it makes sense based on the following reasoning:
The exact computation subtracts off a complete multiple of the divisor
for each result digit, so the overall cost is roughly proportional to
res_ndigitpairs * var2ndigitpairs
The approximate calculation computes an additional DIV_GUARD_DIGITS
result digits (4 NBASE digits, or 2 NBASE^2 digits), so there's an
additional cost of around 2 * var2ndigitpairs, but it ignores digits
in the working dividend beyond the guard digits, which means that, for
later result digits, it is able to truncate the subtraction step. The
size of the truncation increases from 1 to var2ndigitpairs-1 as "qi"
increases, so the overall net saving is roughly
1 + 2 + 3 + ... + (var2ndigitpairs-1) - 2 * var2ndigitpairs
The first part of that is just an arithmetic series, so this equals
var2ndigitpairs * (var2ndigitpairs - 1) / 2 - 2 * var2ndigitpairs
= var2ndigitpairs * (var2ndigitpairs - 5) / 2
That suggests there will be an overall saving once var2ndigitpairs is
larger than 5, which is roughly consistent with observations. It's
understandable that it's a little larger in practice, due to other
overheads in the outer loop.
So I'm inclined to add something like
if (var2ndigitpairs <= 6)
exact = true;
near the start, to try to get the best performance in all cases.
I also tested how common it was that a correction needed to be applied
to the quotient. Using random inputs, the quotient was one too small
in the final place roughly 1 in every 2000 - 100000 times (varying
depending on the size of the inputs), and it was never off by more
than one, or too large. A simple example where the estimated quotient
is one too small is this:
select div(5399133729, 5399133729);
This initially rounds the wrong way, due to loss of precision in the
floating-point numbers.
It is also possible to trigger the other case (an estimated quotient
digit that is one too large) with carefully crafted inputs of this
form:
-- 53705623790171816464 = 7328412092 * 7328412092
select div(53705623790171816464 - 1, 7328412092) = 7328412092 - 1;
Again, I wasn't able to cause the intermediate result to be off by
more than one using these kinds of inputs.
I'm inclined to add those examples to the regression tests, just so
that we have coverage of those 2 branches of the correction code.
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Optimise-numeric-division-using-base-NBASE-2-arit.patch | text/x-patch | 44.9 KB |
v1-0002-Test-code-for-div_var.patch | text/x-patch | 35.0 KB |
perf_test.sql | application/sql | 6.1 KB |
perf_test.out | application/octet-stream | 43.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Sabino Mullane | 2024-08-23 13:55:58 | Re: Enable data checksums by default |
Previous Message | Peter Eisentraut | 2024-08-23 13:17:17 | Re: Enable data checksums by default |