On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote:
> I would try unrolling the loop some (if possible) and retesting.
The inner loop was already unrolled, but was only processing single
bytes at a time. By loading in 32-bit words at once, it reduced the
cost to only 7 cycles per byte (from 13).
--
Bruce Guenter <bruceg(at)em(dot)ca> http://em.ca/~bruceg/