From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Aaron Altman <aaronaltman(at)posteo(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Optimize numeric.c mul_var() using the Karatsuba algorithm |
Date: | 2024-07-02 09:34:00 |
Message-ID: | CAEZATCX1TZjodeVgeeeZ4AXLcvJYL3VOBqp9MJDChZP6YJ0EWg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 30 Jun 2024 at 11:22, Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> The surprising realization here is that there are actually (var1ndigits, var2ndigits)
> combinations where *only* doing mul_var_karatsuba_half() recursively
> all the way down to schoolbook *is* a performance win,
> even though we don't do any mul_var_karatsuba_full().
>
> Indeed only mul_var_karatsuba_half() will be called with the inputs:
> var1ndigits=1000 var2ndigits=10000
> var1ndigits=1000 var2ndigits=5000
> It will never call mul_var_karatsuba_full().
>
> Surprisingly, this still gives a 13% speed-up on a Intel Core i9-14900K.
>
Hmm, I don't see any gains in that example. Logically, it is doing
slightly more work splitting up var2 and adding partial products.
However, it's possible that what you're seeing is a memory access
problem where, if var2 is too large, it won't fit in cache close to
the CPU, and since the schoolbook algorithm traverses var2 multiple
times, it ends up being quicker to split up var2. Since I have a
slower CPU, I'm more likely to be CPU-limited, while you might be
memory-limited.
This makes me think that this is always going to be very
hardware-dependent, and we shouldn't presume what will work best on
the user's hardware.
However, if a 1000x1000 ndigit product is known to be faster using
Karatsuba on some particular hardware (possibly nearly all hardware),
then why wouldn't it make sense to do the above as 10 invocations of
Karatsuba?
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Wienhold | 2024-07-02 09:37:45 | Re: Underscore in positional parameters? |
Previous Message | Jelte Fennema-Nio | 2024-07-02 09:22:44 | Re: [PATCH] Handle SK_SEARCHNULL and SK_SEARCHNOTNULL in HeapKeyTest |