From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Date: | 2024-07-06 09:34:18 |
Message-ID: | CAEZATCUmrq4pgJdQNQendiPkmyM=qdnZv1g-vtoMyrOyCi6xdA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 5 Jul 2024 at 18:37, Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote:
> > Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
> > is now the winner on all my CPUs:
>
> I thought it would be interesting to also measure the isolated effect
> on just numeric_mul() without the query overhead.
>
> Impressive speed-up, between 25% - 81%.
>
Cool. I think we should go with the mul_var_small() patch then, since
it's more generally applicable.
I also did some testing with much larger var2 values, and saw similar
speed-ups. One high-level function that benefits from that is
factorial(), which accepts inputs up to 32177, and so uses both the
1-digit and 2-digit code with very large var2 values. I doubt anyone
actually uses it with such large inputs, but it's interesting
nonetheless:
SELECT factorial(32177);
Time: 923.117 ms -- HEAD
Time: 534.375 ms -- mul_var_small() patch
I did one more round of (mostly cosmetic) copy-editing. Aside from
improving some of the comments, it occurred to me that there's no need
to pass rscale to mul_var_small(), or for it to call round_var(),
since it's always computing the exact result. That shaves off a few
more cycles.
Additionally, I didn't like how res_weight and res_ndigits were being
set 1 higher than they needed to be. That makes sense in mul_var()
because it may round the result, causing a non-zero carry to propagate
into the next digit up, but it's just confusing in mul_var_small(). So
I've reduced those by 1, which makes the look much more logical. To be
clear, this doesn't change how many digits we're calculating. But now
res_ndigits is actually the number of digits being calculated, whereas
before, res_ndigits was 1 larger and we were calculating res_ndigits -
1 digits, which was confusing.
I think this is good to go, so unless there are any further comments,
I plan to commit it soon.
Possible future work would be to try extending it to larger var1
values. I have a feeling that might work quite well for 5 or 6 digits,
but at some point, we'll start seeing diminishing returns, and the
code bloat won't be worth it.
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
v8-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch | text/x-patch | 7.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pantelis Theodosiou | 2024-07-06 09:51:20 | Re: Should we document how column DEFAULT expressions work? |
Previous Message | Richard Guo | 2024-07-06 09:32:41 | Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop() |